summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* i965: Return whether the miptree was resolved from ↵Francisco Jerez2016-08-252-5/+9
| | | | | | | | | intel_miptree_resolve_color(). This will allow optimizing out the cache flush in some cases when resolving wasn't necessary. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Translate nir_intrinsic_load_output on a fragment output.Francisco Jerez2016-08-251-0/+20
| | | | | | | This gets the non-coherent framebuffer fetch path hooked up to the NIR front-end. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Allocate fragment output temporaries on demand.Francisco Jerez2016-08-251-46/+27
| | | | | | | | | This gets rid of the duplication of logic between nir_setup_outputs() and get_frag_output() by allocating fragment output temporaries lazily whenever get_frag_output() is called. This makes nir_setup_outputs() a no-op for the fragment shader stage. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Rework representation of fragment output locations in NIR.Francisco Jerez2016-08-253-10/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The problem with the current approach is that driver output locations are represented as a linear offset within the nir_outputs array, which makes it rather difficult for the back-end to figure out what color output and index some nir_intrinsic_load/store_output was meant for, because the offset of a given output within the nir_output array is dependent on the type and size of all previously allocated outputs. Instead this defines the driver location of an output to be the pair formed by its GLSL-assigned location and index (I've borrowed the bitfield macros from brw_defines.h in order to represent the pair of integers as a single scalar value that can be assigned to nir_variable_data::driver_location). nir_assign_var_locations is no longer useful for fragment outputs. Because fragment outputs are now allocated independently rather than within the nir_outputs array, the get_frag_output() helper becomes necessary in order to obtain the right temporary register for a given location-index pair. The type_size helper passed to nir_lower_io is now type_size_dvec4 rather than type_size_vec4_times_4 so that output array offsets are provided in terms of whole array elements rather than in terms of scalar components (dvec4 is the largest vector type supported by the GLSL so this will cause all individual fragment outputs to have a size of one regardless of the type). Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Fix undefined signed overflow in INTEL_MASK for bitfields of 31 bits.Francisco Jerez2016-08-251-1/+1
| | | | | | | Most likely we had only ever used this macro on bitfields of less than 31 bits -- That's going to change shortly. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Special-case nir_intrinsic_store_output for the fragment shader.Francisco Jerez2016-08-251-0/+15
| | | | | | | | | | | I'm about to change how fragment shader output locations are represented, so the generic nir_intrinsic_store_output implementation that assumes that outputs are just contiguous elements in the big nir_outputs array won't work anymore. This somewhat simplified implementation of nir_intrinsic_store_output for fragment shaders should be functionally equivalent to the current fall-back one. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Implement non-coherent framebuffer fetch using the sampler unit.Francisco Jerez2016-08-252-0/+94
| | | | | | v2: Memoize sample ID, misc codestyle changes. (Ken) Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Emit interpolation setup if non-coherent framebuffer fetch is in use.Francisco Jerez2016-08-251-1/+2
| | | | | | | | This will be required for the next commit since the non-coherent path makes use of the fragment coordinates implicitly, so they need to be calculated. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Force per-sample dispatch if the shader reads from a multisample FBO.Francisco Jerez2016-08-251-1/+2
| | | | | | | | | The result of a framebuffer fetch from a multisample FBO is inherently per-sample, so the spec requires at least those sections of the shader that depend on the framebuffer fetch result to be executed once per sample. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Allocate space in the binding table for non-coherent FB fetch.Francisco Jerez2016-08-254-7/+16
| | | | | | | | | | | | | | Unfortunately due to the inconsistent meaning of some surface state structure fields, we cannot re-use the same binding table entries for sampling from and rendering into the same set of render buffers, so we need to allocate a separate binding table block specifically for render target reads if the non-coherent path is in use. The slight noise is due to the change of brw_assign_common_binding_table_offsets to return the next available binding table index rather than void. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Add brw_wm_prog_key bit specifying whether FB reads should be coherent.Francisco Jerez2016-08-252-0/+7
| | | | | | | | | | | | | | | | | Some of the following changes in this series are specific to the non-coherent path, so I need some way to tell whether the coherent or non-coherent path is in use. The flag defaults to the value of the gl_extensions::MESA_shader_framebuffer_fetch enable so that it can be overridden easily on hardware that supports both framebuffer fetch extensions in order to test the non-coherent path, like: MESA_EXTENSION_OVERRIDE=-GL_EXT_shader_framebuffer_fetch (Of course trying to force-enable the coherent framebuffer fetch extension on hardware without native support won't work and lead to assertion failures). Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Get rid of fs_visitor::do_dual_src.Francisco Jerez2016-08-253-26/+14
| | | | | | | | | | | | | | This boolean flag was being used for two different things: - To set the brw_wm_prog_data::dual_src_blend flag. Instead we can just set it based on whether the dual_src_output register is valid, which will be the case if the shader writes the secondary blending color. - To decide whether to call emit_single_fb_write() once, or in a loop that would iterate only once, which seems pretty useless. Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Handle FB fetch outputs correctly in nir_lower_io_to_temporaries.Francisco Jerez2016-08-251-0/+21
| | | | | | | | | | | This requires emitting a series of copies at the top of the program from each output variable to the corresponding temporary. The initial copy can be skipped for non-framebuffer fetch outputs whose initial value is undefined, and the final copy needs to be skipped for read-only outputs (i.e. gl_LastFragData), since it would be illegal to emit a store output intrinsic for it. Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Pass through fb_fetch_output and OutputsRead from GLSL IR.Francisco Jerez2016-08-252-0/+11
| | | | | | | | | The NIR representation of framebuffer fetch is the same as the GLSL IR's until interface variables are lowered away, at which point it will be translated to load output intrinsics. The GLSL-to-NIR pass just needs to copy the bits over to the NIR program. Reviewed-by: Kenneth Graunke <[email protected]>
* vc4: Add support for fddx/fddyEric Anholt2016-08-251-0/+52
| | | | Based vaguely on a patch by jonasarrow on github.
* vc4: Add register allocation support for MUL output rotation.Eric Anholt2016-08-252-0/+14
| | | | | | | We need the source to be in r0-r3, so make a new register class for it. It will be up to the surrounding passes to make sure that the r0-r3 allocation of its source won't conflict with anything other class requirements on that temp.
* vc4: Add support for MUL output rotation.Eric Anholt2016-08-256-0/+51
| | | | Extracted from a patch by jonasarrow on github.
* vc4: Add support for the 2-bit LOAD_IMM variants.Eric Anholt2016-08-256-0/+58
| | | | | Extracted and fixed up from a patch by jonasarrow on github. This ended up not getting used for ddx/ddy, but seems like it might still be useful.
* vc4: Add QPU scheduling to handle MUL rotate sources.Eric Anholt2016-08-251-0/+13
| | | | We need MUL rotates to do ddx/ddy support.
* vc4: Add disassembly for constant MUL rotatesEric Anholt2016-08-251-9/+11
|
* vc4: Add real validation for MUL rotation.Eric Anholt2016-08-252-10/+43
| | | | Caught problems in the upcoming DDX/DDY implementation.
* vc4: Add a QIR value for the QPU element register.Eric Anholt2016-08-254-0/+8
| | | | | This will be used in the ddx/ddy support for "Am I the top half?" or "Am I the left half?" checks.
* i965: Respect miptree offsets in intel_readpixels_tiled_memcpy()Chad Versace2016-08-251-17/+4
| | | | | | | | | | | | | | | Respect intel_miptree_slice::x_offset,y_offset and intel_mipmap_tree::offset. All three may be non-zero when glReadPixels is called on an EGLImage created from the non-base slice of a miptree. Patch 2/2 that fixes test 'dEQP-EGL.functional.image.create.gles2_cubemap_*'. Reported-by: Haixia Shi <[email protected]> Diagnosed-by: Haixia Shi <[email protected]> Cc: [email protected] Reviewed-by: Kenneth Graunke <[email protected]> Change-Id: I4b397b27e55a743a7094d29fb0a6a4b6b34352b0
* i965: Fix miptree layout for EGLImage-based renderbuffersChad Versace2016-08-251-0/+13
| | | | | | | | | | | | | | | | | | | | When glEGLImageTargetRenderbufferStorageOES() was given an EGLImage created from the non-base slice of a miptree, intel_image_target_renderbuffer_storage() forgot to apply the intra-tile offsets __DRIimage::tile_x,tile_y to the miptree layout. This patch fixes the problem with a quick hack suitable for cherry-picking. A proper fix requires more thorough plumbing in intel_miptree_create_layout() and brw_tex_layout(). Patch 1/2 that fixes test 'dEQP-EGL.functional.image.create.gles2_cubemap_*'. Reported-by: Haixia Shi <[email protected]> Diagnosed-by: Haixia Shi <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: [email protected] Change-Id: I8a64b0048a1ee9e714ebb3f33fffd8334036450b
* intel: Flatten the makefile structureJason Ekstrand2016-08-258-182/+189
| | | | | | | | | | This pulls isl and genxml into a single make file so that they can properly build in parallel. This isn't terribly important now as genxml just generates sources which happens serially first anyway but it will be more important as we add more stuff to src/intel. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* isl/tests: Use a longer path for isl.hJason Ekstrand2016-08-251-2/+2
| | | | | | | | | The tests assumed that isl would be in the include path but that usually isn't the case. Instead, we usually have src/intel and you need to add an "isl/" prefix. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel/isl/gen9: Only use the magic 1D alignment for GEN9_1D surfacesJason Ekstrand2016-08-251-1/+1
| | | | | | | | | If the surface has a layout of GEN4_2D then we need to compute a normal 2D alignment and not use the magic linewar 1D alignment. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* intel/isl: Pass the dim_layout into choose_alignment_elJason Ekstrand2016-08-2511-13/+24
| | | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* intel/isl: Use DIM_LAYOUT_GEN4_2D for tiled 1-D surfaces on SKLJason Ekstrand2016-08-251-5/+23
| | | | | | | | | The Sky Lake 1D layout is only used if the surface is linear. For tiled surfaces such as depth and stencil the old gen4 2D layout is used. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* nir/phi_builder: Don't recurse in value_get_block_defJason Ekstrand2016-08-251-29/+36
| | | | | | | | | | | | | In some programs, we can have very deep dominance trees and the recursion can cause us to risk stack overflows. Instead, we replace the recursion with a pair of loops, one at the start and one at the end. This is functionally equivalent to what we had before and it's actually a bit easier to read in the new form without the recursion. Signed-off-by: Jason Ekstrand <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97225 Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* nir: Walk blocks in source code order in lower_vars_to_ssa.Matt Turner2016-08-252-106/+106
| | | | | | | | | | | | | | | | | | | | | | | | | | Prior to this commit rename_variables_block() is recursively called, performing a depth-first traversal of the control flow graph. The function uses a non-trivial amount of stack space for local variables, which puts us in danger of smashing the stack, given a sufficiently deep dominance tree. XCOM: Enemy Within contains a shader with such a dominance tree (1574 nir_blocks in total, depth of at least 143). Jason tells me that he believes that any walk over the nir_blocks that respects dominance is sufficient (a DFS might have been necessary prior to the introduction of nir_phi_builder). In fact, the introduction of nir_phi_builder made the problem worse: rename_variables_block(), walks to the bottom of the dominance tree before calling nir_phi_builder_value_get_block_def() which walks back to the top of the dominance tree... In any case, this patch ensures we avoid that problem as well. Cc: [email protected] Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97225 Reviewed-by: Connor Abbott <[email protected]>
* radeonsi: don't use allocas for arrays with LLVM 3.8Marek Olšák2016-08-251-1/+3
| | | | | | It crashes. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97413
* gallium/radeon: unify and simplify checking for an empty gfx IBMarek Olšák2016-08-253-27/+23
| | | | | | | We can take advantage of the fact that multi_fence does the obvious thing with NULL fences. This fixes unflushed fences that can get stuck due to empty IBs.
* meta: Always do GenerateMipmaps in linear colorspace.Kenneth Graunke2016-08-251-2/+10
| | | | | | | | | | | | When generating mipmaps for sRGB textures, force both decode and encode, so the filtering is done in linear colorspace, regardless of settings. Fixes a WebGL conformance test in Chrome: https://www.khronos.org/registry/webgl/sdk/tests/conformance2/textures/misc/tex-srgb-mipmap.html?webglVersion=2 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97322 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* swrast: fix incorrectly positioned putImage() in swrast driverBrian Paul2016-08-251-2/+2
| | | | | | | | | | | | | | | Some front buffer rendering was in the wrong position. This included scissored clears, glDrawPixels and glCopyPixels. The problem was the y coordinate passed to putImage() didn't match the y coordinate passed to getImage(). We fix this by setting xrb->map_y to the inverted coordinate in swrast_map_renderbuffer() which is used later by the putImage() call. Also pass xrb->map_y to getImage() to be symmetric. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97426 Cc: <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* radeonsi: disable SDMA texture copying on CarrizoMarek Olšák2016-08-251-0/+6
| | | | | Cc: 12.0 <[email protected]> Reviewed-by: Michel Dänzer <[email protected]>
* gallium/noop: use 3-space indentationMarek Olšák2016-08-252-292/+292
| | | | Reviewed-by: Brian Paul <[email protected]>
* gallium: add a pipe_context parameter to resource_get_handleMarek Olšák2016-08-2521-16/+46
| | | | | | | | radeonsi needs to do some operations (DCC decompression) for OpenGL-OpenCL interop and this is the only way to make it coherent with the current context. It can optionally be set to NULL. Reviewed-by: Brian Paul <[email protected]>
* st/mesa: fix sRGB BlitFramebuffer regressionNicolai Hähnle2016-08-251-16/+18
| | | | | | | | | Broken since: 3190c7ee9727161d627f107c2e7f8ec3a11941c1 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97285 Tested-by: Edmondo Tommasina <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* loader/dri3: Overhaul dri3_update_num_backMichel Dänzer2016-08-251-9/+6
| | | | | | | | | | | | | | | | | | | | | | | | | Always use 3 buffers when flipping. With only 2 buffers, we have to wait for a flip to complete (which takes non-0 time even with asynchronous flips) before we can start working on the next frame. We were previously only using 2 buffers for flipping if the X server supports asynchronous flips, even when we're not using asynchronous flips. This could result in bad performance (the referenced bug report is an extreme case, where the inter-frame stalls were preventing the GPU from reaching its maximum clocks). I couldn't measure any performance boost using 4 buffers with flipping. Performance actually seemed to go down slightly, but that might have been just noise. Without flipping, a single back buffer is enough for swap interval 0, but we need to use 2 back buffers when the swap interval is non-0, otherwise we have to wait for the swap interval to pass before we can start working on the next frame. This condition was previously reversed. Cc: "12.0 11.2" <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97260 Reviewed-by: Frank Binns <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* anv: Include the pipeline layout in the shader hashJason Ekstrand2016-08-244-4/+40
| | | | | | | | | | | | The pipeline layout affects shader compilation because it is what determines binding table locations as well as whether or not a particular buffer has dynamic offsets. Since this affects the generated shader, it needs to be in the hash. This fixes a bunch of CTS tests now that the CTS is using a pipeline cache. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Cc: "12.0" <[email protected]>
* anv: Add a --disable-vulkan-icd-full-driver-path optionJason Ekstrand2016-08-252-2/+8
| | | | | | | | This option makes installed Vulkan ICD files contain only a driver library name and not a path. This is intended for distros to help them work around multi-arch issues. Reviewed-by: Dave Airlie <[email protected]>
* i965/fs: Don't consider the stencil output to be a color output.Francisco Jerez2016-08-241-1/+2
| | | | | | | | | This would cause gl_FragStencilRef to be counted as a color output incorrectly during the precompile phase, which leads to unnecessary recompilation on master and could trigger an assertion failure in fs_visitor::emit_fb_writes() on my i965-fb-fetch branch. Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: Keep track of the set of fragment outputs read by a GL program.Francisco Jerez2016-08-242-0/+4
| | | | | | | | This is the set of shader outputs whose initial value is provided to the shader by some external means when the shader is executed, rather than computed by the shader itself. Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: Don't consider read-only fragment outputs to be written to.Francisco Jerez2016-08-241-1/+1
| | | | | | | | Since they cannot be written. This prevents adding fragment outputs to the OutputsWritten set that are only read from via the gl_LastFragData array but never written to. Reviewed-by: Kenneth Graunke <[email protected]>
* glsl/linker: Allow fragment output overlap for gl_LastFragData.Francisco Jerez2016-08-241-0/+3
| | | | | | gl_LastFragData overlaps gl_FragData by definition. Reviewed-by: Kenneth Graunke <[email protected]>
* glsl/ast: Allow redeclaration of gl_LastFragData with different precision ↵Francisco Jerez2016-08-241-0/+12
| | | | | | | | qualifier. v2: No need to check the GLSL version. (Ken) Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: Don't attempt to do dead varying elimination on gl_LastFragData arrays.Francisco Jerez2016-08-241-3/+4
| | | | | | | | | | Apparently this pass can only handle elimination of a single built-in fragment output array, so the presence of gl_LastFragData (which it wouldn't split correctly anyway) could prevent it from splitting the actual gl_FragData array. Just match gl_FragData by name since it's the only built-in it can handle. Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: Define a gl_LastFragData built-in for older GLSL versions.Francisco Jerez2016-08-241-0/+10
| | | | | | | | | | | | | | | | | | | | | | | The EXT_shader_framebuffer_fetch extension defines alternative language for GLES2 shaders where user-defined fragment outputs are not allowed. Instead of using inout user-defined fragment outputs the shader is expected to read from the gl_LastFragData built-in array. In addition this allows using the same language on desktop GLSL versions prior to 4.2 that support the deprecated gl_FragData built-in in preparation for the MESA_shader_framebuffer_fetch desktop GL extension. Both legacy and user-defined inout outputs have a common representation at the GLSL IR level, so it shouldn't make any difference for optimization passes and back-ends whether the application is using gl_LastFragData or user-defined outputs, all they'll see is a variable dereference of a fragment output at a certain interface location with the fb_fetch_output bit set to one. v2: Don't define the built-in variable on GLSL versions for which gl_FragData exists but is deprecated. (Ken) Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: Handle the inout qualifier in fragment shader output declarations.Francisco Jerez2016-08-242-1/+16
| | | | | | | | | | | | According to the EXT_shader_framebuffer_fetch extension the inout qualifier can be used on ESSL 3.0+ shaders to declare a special kind of fragment output that gets implicitly initialized with the previous framebuffer contents at the current fragment coordinates. In addition we allow using the same language to define FB fetch outputs in GLSL 1.3+ shaders in preparation for the desktop MESA_shader_framebuffer_fetch extensions. Reviewed-by: Kenneth Graunke <[email protected]>