summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* i965: Expose shader framebuffer fetch extensions on Gen9+.Francisco Jerez2016-08-251-1/+2
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Hook up coherent framebuffer reads to the NIR front-end.Francisco Jerez2016-08-251-2/+20
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Remove special casing of framebuffer writes in scheduler code.Francisco Jerez2016-08-251-2/+1
| | | | | | | | | | | | The reason why it was safe for the scheduler to ignore the side effects of framebuffer write instructions was that its side effects couldn't have had any influence on any other instruction in the program, because we weren't doing framebuffer reads, and framebuffer writes were always non-overlapping. We need actual memory dependency analysis in order to determine whether a side-effectful instruction can be reordered with respect to other instructions in the program. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Don't CSE render target messages with different target index.Francisco Jerez2016-08-251-0/+1
| | | | | | | | | We weren't checking the fs_inst::target field when comparing whether two instructions are equal. For FB writes it doesn't matter because they aren't CSE-able anyway, but this would have become a problem with FB reads which are expression-like instructions. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Define logical framebuffer read opcode and lower it to physical reads.Francisco Jerez2016-08-254-0/+28
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Define framebuffer read virtual opcode.Francisco Jerez2016-08-255-0/+29
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/disasm: Fix RC message type strings on Gen7+.Francisco Jerez2016-08-251-3/+25
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/eu: Add codegen support for the Gen9+ render target read message.Francisco Jerez2016-08-253-0/+40
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/eu: Take into account the target cache argument in brw_set_dp_read_message.Francisco Jerez2016-08-252-4/+18
| | | | | | | | | | | brw_set_dp_read_message() was setting the data cache as send message SFID on Gen7+ hardware, ignoring the target cache specified by the caller. Some of the callers were passing a bogus target cache value as argument relying on brw_set_dp_read_message not to take it into account. Fix them too. Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Flip the non-coherent framebuffer fetch extension bit on G45-Gen8 ↵Francisco Jerez2016-08-251-0/+3
| | | | | | | | | | | | | | | | | | | | | hardware. This is not enabled on the original Gen4 part because it lacks surface state tile offsets so it may not be possible to sample from arbitrary non-zero layers of the framebuffer depending on the miptree layout (it should be possible to work around this by allocating a scratch surface and doing the same hack currently used for render targets, but meh...). On Gen9+ even though it should mostly work (feel free to force-enable it in order to compare the coherent and non-coherent paths in terms of performance), there are some corner cases like 1D array layered framebuffers that cannot be handled easily by the non-coherent path because of the incompatible layout in memory of 1D and 2D miptrees (it should be possible to work around this too by doing state-dependent recompiles, but it's hard to care enough since Gen9 has native support for coherent render target reads...) Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Implement glBlendBarrier.Francisco Jerez2016-08-251-0/+20
| | | | | | | | | This is a no-op if the platform supports coherent framebuffer fetch, -- If it doesn't we just need to flush the render cache and invalidate the texture cache in order for previous rendering to be visible to framebuffer fetch. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Upload surface state for non-coherent framebuffer fetch.Francisco Jerez2016-08-253-0/+94
| | | | | | | | This iterates over the list of attached render buffers and binds appropriate surface state structures to the binding table block allocated for shader framebuffer read. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Implement support for overriding the texture target in ↵Francisco Jerez2016-08-251-4/+50
| | | | | | | | | | | | | | | | | | | | | brw_emit_surface_state. This allows the caller to bind a miptree using a texture target other than the one it it was created with. The code should work even if the memory layouts of the specified and original targets don't match, as long as the caller only intends to access a single slice of the miptree structure. This will be exploited by the next commit in order to support non-coherent framebuffer fetch of a single layer of a 3D texture (since some generations lack the minimum array element control for 3D textures bound to the sampler unit), and multiple layers of a 1D array texture (since binding it as an actual 1D array texture would require state-dependent recompiles because the same shader couldn't simultaneously work for 1D and 2D array textures due to the different texel fetch coordinate ordering). Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Massage argument list of brw_emit_surface_state().Francisco Jerez2016-08-251-11/+11
| | | | | | | | | | | | | | | | | | This commit does three different things in a single pass in order to keep the amount of churn low: Remove the for_gather boolean argument which was unused, pass the isl_view argument by value rather than by reference since I'll have to modify it from within the function, and add a target argument to allow callers to bind textures using a target other than the original. The prototype of the function now looks like: void brw_emit_surface_state(struct brw_context *brw, struct intel_mipmap_tree *mt, GLenum target, struct isl_view view, uint32_t mocs, uint32_t *surf_offset, int surf_index, unsigned read_domains, unsigned write_domains); Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add missing has_surface_tile_offset flag to the Gen8+ device info ↵Francisco Jerez2016-08-251-0/+2
| | | | | | | | | structures. This surface state control has been supported by all hardware generations since G45. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Return the correct layout from get_isl_dim_layout for pre-ILK cube ↵Francisco Jerez2016-08-251-2/+5
| | | | | | textures. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Factor out isl_surf_dim/isl_dim_layout calculation into functions.Francisco Jerez2016-08-252-23/+55
| | | | | | | | The logic to calculate the right layout and dimensionality for a given GL texture target is going to be useful elsewhere, factor it out from intel_miptree_get_isl_surf(). Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Resolve color for non-coherent FB fetch at UpdateState time.Francisco Jerez2016-08-251-0/+17
| | | | | | | | | | | | This is required because the sampler unit used to fetch from the framebuffer is unable to interpret non-color-compressed fast-cleared single-sample texture data. Roughly the same limitation applies for surfaces bound to texture or image units, but unlike texture sampling, non-coherent framebuffer fetch is by definition non-coherent with previous rendering, so the brw_render_cache_set_check_flush() call can be omitted except after resolve. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Return whether the miptree was resolved from ↵Francisco Jerez2016-08-252-5/+9
| | | | | | | | | intel_miptree_resolve_color(). This will allow optimizing out the cache flush in some cases when resolving wasn't necessary. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Translate nir_intrinsic_load_output on a fragment output.Francisco Jerez2016-08-251-0/+20
| | | | | | | This gets the non-coherent framebuffer fetch path hooked up to the NIR front-end. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Allocate fragment output temporaries on demand.Francisco Jerez2016-08-251-46/+27
| | | | | | | | | This gets rid of the duplication of logic between nir_setup_outputs() and get_frag_output() by allocating fragment output temporaries lazily whenever get_frag_output() is called. This makes nir_setup_outputs() a no-op for the fragment shader stage. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Rework representation of fragment output locations in NIR.Francisco Jerez2016-08-253-10/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The problem with the current approach is that driver output locations are represented as a linear offset within the nir_outputs array, which makes it rather difficult for the back-end to figure out what color output and index some nir_intrinsic_load/store_output was meant for, because the offset of a given output within the nir_output array is dependent on the type and size of all previously allocated outputs. Instead this defines the driver location of an output to be the pair formed by its GLSL-assigned location and index (I've borrowed the bitfield macros from brw_defines.h in order to represent the pair of integers as a single scalar value that can be assigned to nir_variable_data::driver_location). nir_assign_var_locations is no longer useful for fragment outputs. Because fragment outputs are now allocated independently rather than within the nir_outputs array, the get_frag_output() helper becomes necessary in order to obtain the right temporary register for a given location-index pair. The type_size helper passed to nir_lower_io is now type_size_dvec4 rather than type_size_vec4_times_4 so that output array offsets are provided in terms of whole array elements rather than in terms of scalar components (dvec4 is the largest vector type supported by the GLSL so this will cause all individual fragment outputs to have a size of one regardless of the type). Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Fix undefined signed overflow in INTEL_MASK for bitfields of 31 bits.Francisco Jerez2016-08-251-1/+1
| | | | | | | Most likely we had only ever used this macro on bitfields of less than 31 bits -- That's going to change shortly. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Special-case nir_intrinsic_store_output for the fragment shader.Francisco Jerez2016-08-251-0/+15
| | | | | | | | | | | I'm about to change how fragment shader output locations are represented, so the generic nir_intrinsic_store_output implementation that assumes that outputs are just contiguous elements in the big nir_outputs array won't work anymore. This somewhat simplified implementation of nir_intrinsic_store_output for fragment shaders should be functionally equivalent to the current fall-back one. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Implement non-coherent framebuffer fetch using the sampler unit.Francisco Jerez2016-08-252-0/+94
| | | | | | v2: Memoize sample ID, misc codestyle changes. (Ken) Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Emit interpolation setup if non-coherent framebuffer fetch is in use.Francisco Jerez2016-08-251-1/+2
| | | | | | | | This will be required for the next commit since the non-coherent path makes use of the fragment coordinates implicitly, so they need to be calculated. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Force per-sample dispatch if the shader reads from a multisample FBO.Francisco Jerez2016-08-251-1/+2
| | | | | | | | | The result of a framebuffer fetch from a multisample FBO is inherently per-sample, so the spec requires at least those sections of the shader that depend on the framebuffer fetch result to be executed once per sample. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Allocate space in the binding table for non-coherent FB fetch.Francisco Jerez2016-08-254-7/+16
| | | | | | | | | | | | | | Unfortunately due to the inconsistent meaning of some surface state structure fields, we cannot re-use the same binding table entries for sampling from and rendering into the same set of render buffers, so we need to allocate a separate binding table block specifically for render target reads if the non-coherent path is in use. The slight noise is due to the change of brw_assign_common_binding_table_offsets to return the next available binding table index rather than void. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Add brw_wm_prog_key bit specifying whether FB reads should be coherent.Francisco Jerez2016-08-252-0/+7
| | | | | | | | | | | | | | | | | Some of the following changes in this series are specific to the non-coherent path, so I need some way to tell whether the coherent or non-coherent path is in use. The flag defaults to the value of the gl_extensions::MESA_shader_framebuffer_fetch enable so that it can be overridden easily on hardware that supports both framebuffer fetch extensions in order to test the non-coherent path, like: MESA_EXTENSION_OVERRIDE=-GL_EXT_shader_framebuffer_fetch (Of course trying to force-enable the coherent framebuffer fetch extension on hardware without native support won't work and lead to assertion failures). Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Get rid of fs_visitor::do_dual_src.Francisco Jerez2016-08-253-26/+14
| | | | | | | | | | | | | | This boolean flag was being used for two different things: - To set the brw_wm_prog_data::dual_src_blend flag. Instead we can just set it based on whether the dual_src_output register is valid, which will be the case if the shader writes the secondary blending color. - To decide whether to call emit_single_fb_write() once, or in a loop that would iterate only once, which seems pretty useless. Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Handle FB fetch outputs correctly in nir_lower_io_to_temporaries.Francisco Jerez2016-08-251-0/+21
| | | | | | | | | | | This requires emitting a series of copies at the top of the program from each output variable to the corresponding temporary. The initial copy can be skipped for non-framebuffer fetch outputs whose initial value is undefined, and the final copy needs to be skipped for read-only outputs (i.e. gl_LastFragData), since it would be illegal to emit a store output intrinsic for it. Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Pass through fb_fetch_output and OutputsRead from GLSL IR.Francisco Jerez2016-08-252-0/+11
| | | | | | | | | The NIR representation of framebuffer fetch is the same as the GLSL IR's until interface variables are lowered away, at which point it will be translated to load output intrinsics. The GLSL-to-NIR pass just needs to copy the bits over to the NIR program. Reviewed-by: Kenneth Graunke <[email protected]>
* vc4: Add support for fddx/fddyEric Anholt2016-08-251-0/+52
| | | | Based vaguely on a patch by jonasarrow on github.
* vc4: Add register allocation support for MUL output rotation.Eric Anholt2016-08-252-0/+14
| | | | | | | We need the source to be in r0-r3, so make a new register class for it. It will be up to the surrounding passes to make sure that the r0-r3 allocation of its source won't conflict with anything other class requirements on that temp.
* vc4: Add support for MUL output rotation.Eric Anholt2016-08-256-0/+51
| | | | Extracted from a patch by jonasarrow on github.
* vc4: Add support for the 2-bit LOAD_IMM variants.Eric Anholt2016-08-256-0/+58
| | | | | Extracted and fixed up from a patch by jonasarrow on github. This ended up not getting used for ddx/ddy, but seems like it might still be useful.
* vc4: Add QPU scheduling to handle MUL rotate sources.Eric Anholt2016-08-251-0/+13
| | | | We need MUL rotates to do ddx/ddy support.
* vc4: Add disassembly for constant MUL rotatesEric Anholt2016-08-251-9/+11
|
* vc4: Add real validation for MUL rotation.Eric Anholt2016-08-252-10/+43
| | | | Caught problems in the upcoming DDX/DDY implementation.
* vc4: Add a QIR value for the QPU element register.Eric Anholt2016-08-254-0/+8
| | | | | This will be used in the ddx/ddy support for "Am I the top half?" or "Am I the left half?" checks.
* i965: Respect miptree offsets in intel_readpixels_tiled_memcpy()Chad Versace2016-08-251-17/+4
| | | | | | | | | | | | | | | Respect intel_miptree_slice::x_offset,y_offset and intel_mipmap_tree::offset. All three may be non-zero when glReadPixels is called on an EGLImage created from the non-base slice of a miptree. Patch 2/2 that fixes test 'dEQP-EGL.functional.image.create.gles2_cubemap_*'. Reported-by: Haixia Shi <[email protected]> Diagnosed-by: Haixia Shi <[email protected]> Cc: [email protected] Reviewed-by: Kenneth Graunke <[email protected]> Change-Id: I4b397b27e55a743a7094d29fb0a6a4b6b34352b0
* i965: Fix miptree layout for EGLImage-based renderbuffersChad Versace2016-08-251-0/+13
| | | | | | | | | | | | | | | | | | | | When glEGLImageTargetRenderbufferStorageOES() was given an EGLImage created from the non-base slice of a miptree, intel_image_target_renderbuffer_storage() forgot to apply the intra-tile offsets __DRIimage::tile_x,tile_y to the miptree layout. This patch fixes the problem with a quick hack suitable for cherry-picking. A proper fix requires more thorough plumbing in intel_miptree_create_layout() and brw_tex_layout(). Patch 1/2 that fixes test 'dEQP-EGL.functional.image.create.gles2_cubemap_*'. Reported-by: Haixia Shi <[email protected]> Diagnosed-by: Haixia Shi <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: [email protected] Change-Id: I8a64b0048a1ee9e714ebb3f33fffd8334036450b
* intel: Flatten the makefile structureJason Ekstrand2016-08-258-182/+189
| | | | | | | | | | This pulls isl and genxml into a single make file so that they can properly build in parallel. This isn't terribly important now as genxml just generates sources which happens serially first anyway but it will be more important as we add more stuff to src/intel. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* isl/tests: Use a longer path for isl.hJason Ekstrand2016-08-251-2/+2
| | | | | | | | | The tests assumed that isl would be in the include path but that usually isn't the case. Instead, we usually have src/intel and you need to add an "isl/" prefix. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel/isl/gen9: Only use the magic 1D alignment for GEN9_1D surfacesJason Ekstrand2016-08-251-1/+1
| | | | | | | | | If the surface has a layout of GEN4_2D then we need to compute a normal 2D alignment and not use the magic linewar 1D alignment. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* intel/isl: Pass the dim_layout into choose_alignment_elJason Ekstrand2016-08-2511-13/+24
| | | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* intel/isl: Use DIM_LAYOUT_GEN4_2D for tiled 1-D surfaces on SKLJason Ekstrand2016-08-251-5/+23
| | | | | | | | | The Sky Lake 1D layout is only used if the surface is linear. For tiled surfaces such as depth and stencil the old gen4 2D layout is used. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* nir/phi_builder: Don't recurse in value_get_block_defJason Ekstrand2016-08-251-29/+36
| | | | | | | | | | | | | In some programs, we can have very deep dominance trees and the recursion can cause us to risk stack overflows. Instead, we replace the recursion with a pair of loops, one at the start and one at the end. This is functionally equivalent to what we had before and it's actually a bit easier to read in the new form without the recursion. Signed-off-by: Jason Ekstrand <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97225 Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* nir: Walk blocks in source code order in lower_vars_to_ssa.Matt Turner2016-08-252-106/+106
| | | | | | | | | | | | | | | | | | | | | | | | | | Prior to this commit rename_variables_block() is recursively called, performing a depth-first traversal of the control flow graph. The function uses a non-trivial amount of stack space for local variables, which puts us in danger of smashing the stack, given a sufficiently deep dominance tree. XCOM: Enemy Within contains a shader with such a dominance tree (1574 nir_blocks in total, depth of at least 143). Jason tells me that he believes that any walk over the nir_blocks that respects dominance is sufficient (a DFS might have been necessary prior to the introduction of nir_phi_builder). In fact, the introduction of nir_phi_builder made the problem worse: rename_variables_block(), walks to the bottom of the dominance tree before calling nir_phi_builder_value_get_block_def() which walks back to the top of the dominance tree... In any case, this patch ensures we avoid that problem as well. Cc: [email protected] Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97225 Reviewed-by: Connor Abbott <[email protected]>
* radeonsi: don't use allocas for arrays with LLVM 3.8Marek Olšák2016-08-251-1/+3
| | | | | | It crashes. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97413