summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* i965/fs: Skip SIMD lowering source unzipping for regular scalar regions.Francisco Jerez2016-05-291-2/+21
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Factor out region zipping and unzipping from the SIMD lowering pass.Francisco Jerez2016-05-291-60/+84
| | | | | | | | | | Just to make sure we keep the SIMD lowering pass tidy when we introduce additional logic to try to optimize out the copy instructions used to zip and unzip the destination and source regions into multiple packed regions of the lowered instruction width. Shouldn't cause any functional changes. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Generalize regions_overlap() from copy propagation to handle ↵Francisco Jerez2016-05-292-12/+42
| | | | | | | | | | | | non-VGRF files. This will be useful in several places. The only externally visible difference (other than non-VGRF files being supported now) is that the region sizes are now passed in byte units instead of in GRF units because the loss of precision would have become a problem in the SIMD lowering pass. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Refactor offset() into a separate function taking the width as ↵Francisco Jerez2016-05-292-16/+21
| | | | | | | | | | | argument. This will be useful in the SIMD lowering pass to avoid having to construct a builder object of the known region width just to pass it as argument to offset(), which doesn't do anything with it other than taking the builder dispatch_width as region width. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Implement opt_sampler_eot() in terms of logical sends.Francisco Jerez2016-05-292-55/+40
| | | | | | | | | | | | | | | | | | | | | | This makes the whole LOAD_PAYLOAD munging unnecessary which simplifies the code and will allow the optimization to succeed in more cases independent of whether the LOAD_PAYLOAD instruction can be found or not. The following patch is squashed in: SQUASH: i965/fs: Add basic dataflow check to opt_sampler_eot(). The sampler EOT optimization pass naively assumes that the texturing instruction provides all the data used by the FB write just because they're standing next to each other. The least we should be checking is whether the source and destination regions of the FB write and texturing instructions match. Without this the previous seemingly harmless patch would have caused opt_sampler_eot() to misoptimize a shader from dota-2 causing DCE to eliminate all of its 78 instructions except for the final sampler EOT message (!). Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Fix UB list sentinel dereference in opt_sampler_eot().Francisco Jerez2016-05-291-6/+9
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Take opt_redundant_discard_jumps out of the optimization loop.Francisco Jerez2016-05-291-2/+1
| | | | | | No shader-db regressions. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Run SIMD and logical send lowering after the optimization loop.Francisco Jerez2016-05-291-4/+25
| | | | | | | | | | | | | | | | | | | | | | | There are two reasons why this is useful: - It avoids the introduction of an amount of partial writes emitted by the SIMD lowering pass to zip and unzip register regions early during optimization, which can make subsequent optimization less effective. - It substantially reduces the burden on the compiler when a large fraction of the instructions in the program need to be split (e.g. during SIMD32 builds). Individual halves of split instructions will be optimized identically (if they can still be optimized at all), so doing it up front can duplicate the amount of instructions the optimizer has to deal with which causes the compilation time to explode in some cases due to the worse-than-linear runtime behaviour of the back-end. It seems helpful to re-run a few optimization passes in cases where any of the lowering passes was able to make progress. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Add FS_OPCODE_FB_WRITE_LOGICAL to has_side_effects().Francisco Jerez2016-05-291-0/+1
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Allow constant propagation into logical send sources.Francisco Jerez2016-05-291-0/+34
| | | | | | | Logical sends are eventually lowered into a series of copies so they can take almost anything as source. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Let CSE handle logical sampler sends as expressions.Francisco Jerez2016-05-291-0/+13
| | | | | | | This will prevent some shader-db regressions when we start plumbing logical sends through the optimizer. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Pass a BAD_FILE register to the logical FB write when oMask is unused.Francisco Jerez2016-05-292-2/+3
| | | | | | | This will let the optimizer know that the sample mask value is unused so its definition can be DCE'ed. Reviewed-by: Jason Ekstrand <[email protected]>
* glsl: fix xfb_offset unsized array validationTimothy Arceri2016-05-304-31/+41
| | | | | | | | | | | | | | | | This partially fixes CTS test: GL44-CTS.enhanced_layouts.xfb_get_program_resource_api The test now fails at a tes evaluation shader with unsized output arrays. The ARB_enhanced_layouts spec says: "It is a compile-time error to apply xfb_offset to the declaration of an unsized array." So this seems like a bug in the CTS. Reviewed-by: Dave Airlie <[email protected]>
* glsl: dont crash when attempting to assign a value to a builtin defineTimothy Arceri2016-05-301-1/+1
| | | | | | | | | For example GL_ARB_enhanced_layouts = 3; Fixes: GL44-CTS.enhanced_layouts.glsl_contant_immutablity Reviewed-by: Dave Airlie <[email protected]>
* egl/dri3: don't crash on no context.Dave Airlie2016-05-301-2/+4
| | | | | | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94925 Pointed out by Karol Herbst on irc. Signed-off-by: Dave Airlie <[email protected]> Cc: "11.1 11.2" <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* mesa/program_interface_query: fix transform feedback varyings.Dave Airlie2016-05-302-35/+48
| | | | | | | | | | | | | | The spec says gl_NextBuffer and gl_SkipComponents need to be returned to userspace in the program interface queries. We currently throw those away, this requires a complete piglit run to make sure no drivers fallover due to the extra varyings. This fixes: GL45-CTS.program_interface_query.transform-feedback-built-in Reviewed-by: Timothy Arceri <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* glsl/ast: subroutineTypes can't be returned from functions.Dave Airlie2016-05-301-0/+9
| | | | | | | | | | | These types can't be returned. This fixes: GL43-CTS.shader_subroutine.subroutines_not_allowed_as_variables_constructors_and_argument_or_return_types for the return type case. Reviewed-by: Chris Forbes <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* glsl: use has_double() helperTimothy Arceri2016-05-301-2/+1
| | | | Reviewed-by: Eduardo Lima Mitev <[email protected]>
* glsl: fix explicit uniform block alignmentTimothy Arceri2016-05-301-3/+1
| | | | | | | | | | | | This stops the offset being bumped again when and an explicit alignment has already been applied. Fixes alignment issues in: GL44-CTS.enhanced_layouts.uniform_block_alignment Note the test still fails due to unrelated issues with doubles. Reviewed-by: Eduardo Lima Mitev <[email protected]>
* i965: Shrink stage_prog_data param array lengthJordan Justen2016-05-296-14/+6
| | | | | | | | | | | | | | It appears we were over-allocating these arrays. Previously we would use nir->num_uniforms directly for scalar programs, and multiply it by 4 for vec4 programs. Instead we should have been dividing by 4 in both cases to convert from bytes to a gl_constant_value count. The size of gl_constant_value is 4 bytes. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nv50,nvc0: fix the max_vertices=0 caseIlia Mirkin2016-05-293-2/+4
| | | | | | | This is apparently legal. Drop any emit/restarts, and pass a 1 to the hardware. Signed-off-by: Ilia Mirkin <[email protected]>
* st/mesa: fix setting of point_size_per_vertex in ES contextsIlia Mirkin2016-05-291-2/+18
| | | | | | | | | | | | | GL ES 2.0+ does not have a GL_PROGRAM_POINT_SIZE enable, unlike desktop GL. So we have to go and check the last pre-rasterizer stage to see whether it outputs a point size or not. This fixes a number of dEQP tests that use a geometry or tessellation shader to emit points primitives. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Cc: "11.1 11.2" <[email protected]>
* mesa: skip level checking for FramebufferTexture*D if texture is zeroMarek Olšák2016-05-291-3/+3
| | | | | | | | | | | | | From the OpenGL 4.5 core spec: "An INVALID_VALUE error is generated if texture is not zero and level is not a supported texture level for textarget, as described above." Other FramebufferTexture functions already do the right thing. This fixes the main menu in F1 2015. Cc: 11.1 11.2 <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* st/mesa: expose OES_shader_io_blocks when we have enough for ES 3.1Ilia Mirkin2016-05-283-2/+8
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* swr: [rasterizer] Do not define _mm256_storeu2_m128i with icc.Vinson Lee2016-05-281-1/+1
| | | | | | | | | | | | | | | | Fix build error with icc. CXX libswrAVX_la-swr_clear.lo icpc: command line warning #10006: ignoring unknown option '-Wdelete-non-virtual-dtor' In file included from ./rasterizer/jitter/jit_api.h(31), from swr_context.h(30), from swr_clear.cpp(24): ./rasterizer/common/os.h(135): error: expected an identifier void _mm256_storeu2_m128i(__m128i *hi, __m128i *lo, __m256i a) ^ Signed-off-by: Vinson Lee <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* i965: add missing return in if statementThomas Hindoe Paaboel Andersen2016-05-281-0/+1
| | | | | | | | | | Re-add the "return false" that was removed in 0c02d7002d6c005b4c1fe997b5ef5916978dd183 It seems that something went wrong when merging the patch. The patch sent to the mailing list does not directly match what was committed. https://lists.freedesktop.org/archives/mesa-dev/2016-May/118198.html Reviewed-by: Jason Ekstrand <[email protected]>
* gk110/ir: fix unspilling of predicates from registersIlia Mirkin2016-05-281-0/+28
| | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96258 Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.2 11.1" <[email protected]>
* nvc0: remove outdated surfaces validation code for GK104Samuel Pitoiset2016-05-281-70/+0
| | | | | | | | | This code was used for validating surfaces with compute but now we use pipe_image_view instead. Anyway, surfaces support should be re-introduced properly once OpenCL happens. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: do not always invalidate 3D CBs when using computeSamuel Pitoiset2016-05-281-8/+17
| | | | | | | | | Constant buffers are aliased between 3D and CP on Fermi, but we should only invalidate them when a compute shader actually uses CBs and not all the time after a lauching grid. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* i965: Update compute workgroup size limit calculation for SIMD32.Francisco Jerez2016-05-271-11/+3
| | | | | | | | | | | | This should have the side effect of enabling the ARB_compute_shader extension on Gen8+ hardware and all Gen7 platforms that didn't previously expose it (VLV and IVB GT1) due to the number of hardware threads per subslice being insufficient in SIMD16 mode. v2: Bump workgroup size limit for GLES too (Jordan). Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Add do32 debug option.Francisco Jerez2016-05-273-1/+3
| | | | | | | | | | | The do32 INTEL_DEBUG option causes the back-end to try to generate a SIMD32 program when compiling a compute shader regardless of the specified compute shader workgroup size, which will be useful for testing SIMD32 code generation in the most common case in which the workgroup size doesn't exceed the SIMD16 limit so SIMD32 codegen wouldn't be automatically enabled. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Build 32-wide compute shader when needed.Francisco Jerez2016-05-271-0/+26
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Extend back-end interface for limiting the shader dispatch width.Francisco Jerez2016-05-273-23/+22
| | | | | | | | | This replaces the current fs_visitor::no16() interface with fs_visitor::limit_dispatch_width(), which takes an additional parameter allowing the caller to specify the maximum dispatch width a shader can be compiled with. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Implement SIMD32 register allocation support.Francisco Jerez2016-05-273-8/+9
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Remove pre-Gen7 register allocation class micro-optimization.Francisco Jerez2016-05-271-18/+3
| | | | | | | | | | | | | | | | | This was trying to save some one-time init on pre-Gen7 hardware under the assumption that one would only ever need 1, 2, 4 and 8-wide registers on those platforms. However nothing guarantees that those will be the only VGRF sizes used after lowering and optimization. In some cases we may end up with a temporary of different size being allocated (e.g. by SIMD lowering to zip or unzip a multi-component register region of a logical send instruction), and there is no guarantee that they will be optimized away before register allocation (especially since the compute_to_mrf coalescing pass is rather... lacking...). Instead just allocate classes for all possible VGRF sizes up to MAX_VGRF_SIZE to avoid a crash in pq_test() when we encounter a variable of any other size. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Don't mutate multi-component arguments in sampler payload set-up.Francisco Jerez2016-05-271-58/+32
| | | | | | | | | | | | | | | | | | | | | | The Gen5+ sampler message payload construction code steps through the coordinate and derivative components by induction like 'coordinate = offset(coordinate, bld, 1)', the problem is that while doing that it may step one past the end of the coordinate vector causing an assertion failure in offset() if it happens to be a (single component) immediate. Right now coordinates and derivatives are typically passed as actual registers but that will no longer be the case when we start propagating constants into logical messages. Instead express coordinate components in closed form like 'offset(coordinate, bld, i)' -- The end result seems slightly more readable that way and it allows passing the coordinate and derivative registers by const reference instead of by value, so it seems like a clean-up in its own right. v2: Fold a few post-increment operators into the last MOV statement. (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Fix multiple ACP interference during copy propagation.Francisco Jerez2016-05-271-6/+2
| | | | | | | | | | This is more fallout from cf375a3333e54a01462f192202d609436e5fbec8. It's possible for multiple ACP entries to interfere with a given VGRF write, so we need to continue iterating even if an overlapping entry has already been found. Cc: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Fix cmod propagation not to propagate non-identity cmod into CMP(N).Francisco Jerez2016-05-271-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | The conditional mod of these instructions determines the semantics of the comparison itself (rather than being evaluated based on the result of the instruction as is usually the case for most other instructions that allow conditional mods), so it's in general not legal to propagate a conditional mod into a CMP instruction. This prevents cmod propagation from (mis)optimizing: cmp.z.f0 tmp, ... mov.z.f0 null, tmp into: cmp.z.f0 tmp, ... which gives the negation of the flag result of the original sequence. I could reproduce this easily with SIMD32 but I don't see any reason why the problem would be SIMD32-specific, it was most likely working by luck. Cc: [email protected] Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Estimate number of registers written correctly in ↵Francisco Jerez2016-05-271-2/+2
| | | | | | | | opt_register_renaming. The current estimate is incorrect for non-32b types. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Add (sub)reg_offset asserts to brw_reg_from_fs_reg.Francisco Jerez2016-05-271-0/+2
| | | | | | | These are completely ignored by the conversion to brw_reg, so they better be zero. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Reset reg_offset of the original destination to zero in ↵Francisco Jerez2016-05-271-0/+1
| | | | | | | | compute_to_mrf(). Prevents an assertion failure in the following commit. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Skip remove_duplicate_mrf_writes() during SIMD32 runs.Francisco Jerez2016-05-271-1/+1
| | | | | | | The pass is disabled in SIMD16 dispatch mode for the same reason, it cannot handle instructions that write multiple MRF registers at once. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Use SIMD8 SSBO GET_BUFFER_SIZE message regardless of the dispatch ↵Francisco Jerez2016-05-271-22/+18
| | | | | | width. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Don't emit duplicated SSBO GET_BUFFER_SIZE instruction unnecessarily.Francisco Jerez2016-05-271-1/+0
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Emit fixed width memory fence opcode regardless of the dispatch width.Francisco Jerez2016-05-271-2/+3
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Return 32 bit mask from fs_builder::sample_mask().Francisco Jerez2016-05-271-1/+3
| | | | | | | | This doesn't actually handle the FS case, just add an assertion for the moment so I don't forget to update it later on for SIMD32 fragment shader dispatch. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Emit fixed-width null register regardless of the dispatch width.Francisco Jerez2016-05-271-8/+4
| | | | | | | | brw_null_vec() cannot handle widths over 16 but it doesn't really matter what width we specify for null registers because destination regions have no width field at the hardware level. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Fix half() to handle more exotic register files.Francisco Jerez2016-05-271-21/+4
| | | | | | | | horiz_offset() is able to deal with a superset of the register files currently special-cased in half(). Just call horiz_offset() in all cases. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Fix horiz_offset() to handle ARF and HW GRF register files.Francisco Jerez2016-05-271-4/+10
| | | | | | | We'll hit these in some cases during SIMD lowering in 32-wide programs. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Clean up remaining uses of fs_inst::reads_flag and ::writes_flag.Francisco Jerez2016-05-275-24/+12
| | | | Reviewed-by: Jason Ekstrand <[email protected]>