summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers
Commit message (Collapse)AuthorAgeFilesLines
* mesa: use gl_program for CurrentProgram rather than gl_shader_programTimothy Arceri2017-01-2311-77/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This makes much more sense and should be more performant in some critical paths such as SSO validation which is called at draw time. Previously the CurrentProgram array could have contained multiple pointers to the same struct which was confusing and we would often need to fish out the information we were really after from the gl_program anyway. Also it was error prone to depend on the _LinkedShader array for programs in current use because a failed linking attempt will lose the infomation about the current program in use which is still valid. V2: fix validate_io() to compare linked_stages rather than the consumer and producer to decide if we are looking at inward facing shader interfaces which don't need validation. Acked-by: Edward O'Callaghan <[email protected]> To avoid build regressions the following 2 patches were squashed in to this commit: mesa/meta: rewrite _mesa_shader_program_use() and _mesa_program_use() These are rewritten to do what the function name suggests, that is _mesa_shader_program_use() sets the use of all stage and _mesa_program_use() sets the use of a single stage. Reviewed-by: Lionel Landwerlin <[email protected]> Acked-by: Edward O'Callaghan <[email protected]> mesa: update active relinked program This likely fixes a subroutine bug were _mesa_shader_program_init_subroutine_defaults() would never have been called for the relinked program as we previously just set _NEW_PROGRAM as dirty and never called the _mesa_use* functions when linking. Acked-by: Edward O'Callaghan <[email protected]>
* Revert "i965: Really don't emit Q or UQ moves on Gen < 8"Matt Turner2017-01-201-8/+0
| | | | | | This reverts commit c95380c4044237d73fb537511667c3c8f658fcee. Acked-by: Kenneth Graunke <[email protected]>
* i965: Select DF type for 64-bit integers on Gen < 8.Matt Turner2017-01-204-10/+12
| | | | | | | | | | | | Gen8 adds Q/UQ types. We attempted to change the types back to DF in the generator (commit c95380c40), but an assertion added in the FP64 series (commit e481dcc3) triggers before that code has a chance to execute. In fact, using Q/UQ in the IR and then changing to DF in the generator would not work in the presence of source modifiers, etc. Fixes: d6fcede6 ("i965: Return Q and UQ types for int64 and uint64") Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Enable ARB_gpu_shader_int64 on Gen8+Ian Romanick2017-01-202-0/+6
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Split SIMD16 CMP of Q and UQ instructionsIan Romanick2017-01-201-14/+29
| | | | | | | This is basically the same as happens for doubles. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Enable 64-bit integer support for almost all unary and binary operationsIan Romanick2017-01-201-10/+0
| | | | | | | | Integer comparison functions (e.g., nir_op_ilt) are handled in the next commit. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Enable uploading 64-bit integer uniformsIan Romanick2017-01-201-1/+3
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Add 64-bit integer support for conversions and bitcastsIan Romanick2017-01-202-5/+35
| | | | | | | | | | v2 (idr): Make the "from" type in a cast unsized. This reduces the number of required cast operations at the expensive slightly more complex code. However, this will be a dramatic improvement when other sized integer types are added. Suggested by Connor. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Enable emitting Q and UQ instructions in the fs backendIan Romanick2017-01-202-1/+12
| | | | | | | | v2: Fixup assertion in brw_reg_type_to_hw_type to allow BRW_REGISTER_TYPE_{UQ,Q} on Gen8+. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Add support for constant evaluation on Q and UQ typesIan Romanick2017-01-202-7/+20
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Return Q and UQ types for int64 and uint64Ian Romanick2017-01-201-4/+2
| | | | | | | | | It seems like maybe this should return a different type based on Gen. Q and UQ only exist on Gen8+, but, based on the old comment, I believe previous Gens can generate 64-bit moves. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Really don't emit Q or UQ moves on Gen < 8Ian Romanick2017-01-201-0/+8
| | | | | | | | | | | | | | | | | | | | | | It's much easier to do this in the generator rather than while coming out of NIR. brw_type_for_nir_type doesn't know the Gen, so we'd have to add a bunch of plumbing. The alternate fix is to not emit int64 moves for doubles in the first place... but that seems even more difficult. This change won't catch non-MOV instructions that try to use 64-bit integer types on Gen < 8. This may convert certain kinds of bugs in to different kinds of bugs that are more difficult to detect (since the assertions in the function won't catch them). NOTE: I don't think anything can emit mixed-type 64-bit moves until the same platform supports both ARB_gpu_shader_fp64 and ARB_gpu_shader_int64. When we enable int64 on Gen < 8, we can solve this problem other ways. This prevents regressions on HSW in the next patch. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Avoid int64 warnings.Dave Airlie2017-01-201-0/+28
| | | | | | | | | | | | | Just add operations to the switch statement here. v2 (idr): "cut them down later" => Remove ir_unop_b2u64 and ir_unop_u642b. Handle these with extra i2u or u2i casts just like uint(bool) and bool(uint) conversion is done. Signed-off-by: Dave Airlie <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Avoid int64 induced warningsDave Airlie2017-01-203-0/+6
| | | | | | | | | Just add types into unsupported or double equivalent spots. Signed-off-by: Dave Airlie <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Validate "Special Cases for Byte Operations"Matt Turner2017-01-202-9/+150
| | | | | Do this in general_restrictions_based_on_operand_types() because the two rules that "Special Cases for Byte Operations" relax are checked there.
* i965: Validate "Region Alignment Rules"Matt Turner2017-01-202-1/+697
|
* i965: Validate "General Restrictions Based on Operand Types"Matt Turner2017-01-202-0/+273
|
* i965: Validate "General Restrictions on Regioning Parameters"Matt Turner2017-01-202-0/+377
|
* i965: Replace reg_type_size[] with a function.Matt Turner2017-01-203-25/+75
| | | | | | A function is necessary to handle immediate types. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Validate math instruction sources.Matt Turner2017-01-202-9/+23
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Claim that SEND/math has two sources.Matt Turner2017-01-201-1/+8
| | | | | | | | src1 must be a descriptor (including the information to determine that the SEND is doing an extended math operation), but src0 can actually be null since it serves as the source of the implicit GRF -> MRF move. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Simplify num_sources_from_inst().Matt Turner2017-01-201-3/+1
| | | | | | | | desc will always be non-NULL, because brw_validate_instructions() does not attempt to validate any instructions that fail the is_unsupported_inst() check. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Factor out send_restrictions() function.Matt Turner2017-01-201-12/+22
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Factor out sources_not_null() validation function.Matt Turner2017-01-201-17/+23
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Structure code so unsupported inst will not generate more errors.Matt Turner2017-01-201-2/+5
| | | | | | | | We want to rely on brw_opcode_desc() always returning non-NULL in other validation functions. Other validation functions will be in the else case of the block added in this patch. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add a test for the EU assembly validator.Matt Turner2017-01-202-0/+176
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add a CHECK macro to call more complicated validation funcs.Matt Turner2017-01-201-0/+9
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Make ERROR_IF usable from other functions.Matt Turner2017-01-201-1/+1
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Mark error annotation on correct SIMD16 inst.Matt Turner2017-01-201-2/+2
| | | | | | | | | | | | inst, whose assignment can be seen in the last line of context pointed to the correct instruction in the SIMD16 program, but src_offset was the offset from the beginning of the SIMD16 program. So if an instruction at offset 0x100 in the SIMD16 program was illegal, we would mark an error on the instruction at offset 0x100 (which is likely in the SIMD8 program). Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: Use UW-typed operands when dest is UW.Matt Turner2017-01-201-4/+6
| | | | | | | | | | Using a UD-typed operand makes the execution size D, and if the size of the execution type is greater than the size of the destination type, the destination must be appropriately strided. We actually just want UW-types all around. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Use W-typed immediate in brw_F32TO16().Matt Turner2017-01-201-1/+1
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Don't change F->VF if dest type is DF.Matt Turner2017-01-201-1/+2
| | | | | | | | | We change the immediate source type to VF to allow instruction compaction, but there are no entires in the compaction table for DF, so there's no point in doing this. Additionally, I mixing floating-point types is now allowed except for F and VF.
* i965: Remove unnecessary mt->compressed checksAnuj Phogat2017-01-191-12/+4
| | | | | | | | It's harmless to use ALIGN_NPOT() for uncompressed formats because they have block width/height = 1. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Alejandro Piñeiro <[email protected]>
* i965: Fix indentation in brw_miptree_layout_2d()Anuj Phogat2017-01-191-3/+2
| | | | | Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Alejandro Piñeiro <[email protected]>
* i965: Fix comment to include 3d texturesAnuj Phogat2017-01-191-1/+2
| | | | | Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Alejandro Piñeiro <[email protected]>
* i965: Delete pending CCS and HiZ ops in intel_miptree_make_shareable()Chad Versace2017-01-191-0/+16
| | | | | | | | | | | | | | | | | | | Fixes crash in piglit `egl_khr_gl_renderbuffer_image-clear-shared-image GL_DEPTH_COMPONENT24` on Skylake. The crash happened because blorp attempted to execute a pending hiz clear after the hiz buffer was deleted. Deleting the pending hiz ops when the hiz buffer gets deleted fixes the crash. For good measure, this patch also deletes all pending CCS/MCS ops when the CCS/MCS buffer gets deleted. I'm now aware of any bugs caused by the dangling ops, but deleting them is clearly the right thing to do. Cc: Ben Widawsky <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99265
* mesa/glsl/i965: set and use tcs vertices_out directlyTimothy Arceri2017-01-191-4/+2
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* i965: get outputs_written from gl_programTimothy Arceri2017-01-191-2/+2
| | | | | | | There is no need to go via the pointer in nir_shader. This change is required for the shader cache as we don't create a nir_shader. Reviewed-by: Lionel Landwerlin <[email protected]>
* i965/blorp: Make post draw flush more explicitTopi Pohjolainen2017-01-182-5/+22
| | | | | | | | | | | | | Blits do not need any special treatment as the target buffer object is added to render cache just as one does for normal draw. Color clears and resolves in turn require explicit "end of pipe synchronization". It is not clear what this means exactly but the assumption is that render cache flush with command stream stall should be sufficient. Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/gen6: Issue direct depth stall and flush after depth clearTopi Pohjolainen2017-01-181-1/+6
| | | | | | | | | | | | | | | | | | instead of calling unconditionally brw_emit_mi_flush() which does: brw_emit_pipe_control_flush(brw, PIPE_CONTROL_DEPTH_CACHE_FLUSH | PIPE_CONTROL_RENDER_TARGET_FLUSH | PIPE_CONTROL_CS_STALL); brw_emit_pipe_control_flush(brw, PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE | PIPE_CONTROL_CONST_CACHE_INVALIDATE); Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Make depth clear flushing more explicitTopi Pohjolainen2017-01-182-8/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | Current blorp logic issues unconditional "flush everything" (see brw_emit_mi_flush()) after each render. For example, all blits issue this unconditionally which shouldn't be needed if they set render cache properly so that subsequent renders do necessary flushing before drawing. In case of piglit: ext_framebuffer_multisample-accuracy all_samples depth_draw small intel_hiz_exec() is always preceded by blorb blit and the unconditional flush looks to hide the lack of stall and flushes in depth clears. By removing the brw_emit_mi_flush() I get gpu hangs. This patch adds the stalls and flushes mandated by the spec and gets rid of those hangs. v2 (Jason, Ken): Document the rational for separating depth cache flush and stall on Gen7. Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/blorp: Use the render cache mechanism instead of explicit flushingTopi Pohjolainen2017-01-181-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | by replacing brw_emit_mi_flush() with brw_render_cache_set_check_flush(). The latter splits the flush in two: brw_emit_pipe_control_flush(brw, PIPE_CONTROL_DEPTH_CACHE_FLUSH | PIPE_CONTROL_RENDER_TARGET_FLUSH | PIPE_CONTROL_CS_STALL); brw_emit_pipe_control_flush(brw, PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE | PIPE_CONTROL_CONST_CACHE_INVALIDATE); instead of int flags = PIPE_CONTROL_NO_WRITE | PIPE_CONTROL_RENDER_TARGET_FLUSH; if (brw->gen >= 6) { flags |= PIPE_CONTROL_INSTRUCTION_INVALIDATE | PIPE_CONTROL_CONST_CACHE_INVALIDATE | PIPE_CONTROL_DEPTH_CACHE_FLUSH | PIPE_CONTROL_VF_CACHE_INVALIDATE | PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE | PIPE_CONTROL_CS_STALL; } brw_emit_pipe_control_flush(brw, flags); v2 (Jason): Check that destination exists before trying to add to render cache. Depth clears and resolves don't have it. Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Make brw_cache_item structure private to brw_program_cache.c.Kenneth Graunke2017-01-182-19/+21
| | | | | | | | struct brw_cache_item is an implementation detail of the program cache. We don't need to make those internals available to the entire driver. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]>
* i965: Fix SURFACE_STATE to handle non-zero aux offsetsBen Widawsky2017-01-181-2/+1
| | | | | | | Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Acked-by: Daniel Stone <[email protected]>
* i965: Don't map/unmap in brw_print_program_cache on LLC platforms.Kenneth Graunke2017-01-171-2/+4
| | | | | | | | | | | We have a persistent mapping. Don't map it a second time or try to unmap it. Just use the pointer. This most likely would wreak havoc except that this code is unused (it's only called from an if (0) debug block). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]>
* i965: Move program cache printing to brw_program_cache.c.Kenneth Graunke2017-01-173-57/+49
| | | | | | | | It makes sense to put a function which prints out the entire contents of the program cache in the file that implements the program cache. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]>
* i965: Make a helper for finding an existing shader variant.Kenneth Graunke2017-01-177-85/+68
| | | | | | | | | We had five copies of the same "walk the cache and look for an existing shader variant for this program" code. Now we have one helper function that returns the key. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]>
* i965: Make DCE set null destinations on messages with side effects.Kenneth Graunke2017-01-171-13/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (Co-authored by Matt Turner.) Image atomics, for example, return a value - but the shader may not want to use it. We assigned a useless VGRF destination. This seemed harmless, but it can actually be quite harmful. The register allocator has to assign that VGRF to a real register. It may assign the same actual GRF to the destination of an instruction that follows soon after. This results in a write-after-write (WAW) dependency, and stall. A number of "Deus Ex: Mankind Divided" shaders use image atomics, but don't use the return value. Several of these were hitting WAW stalls for nearly 14,000 (poorly estimated) cycles a pop. Making dead code elimination null out the destination avoids this issue. This patch cuts one shader's estimated cycles by -98.39%! Removing the message response should also help with data cluster bandwidth. On Skylake: (instruction counts remain identical) total cycles in shared programs: 255413890 -> 248081010 (-2.87%) cycles in affected programs: 12019948 -> 4687068 (-61.01%) helped: 24 HURT: 10 v2: Make can_omit_write independent of can_eliminate (Curro). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Combine some dead code elimination NOP'ing code.Kenneth Graunke2017-01-171-8/+1
| | | | | | | | | | | | | | | | | | | | | | | | In theory we might have incorrectly NOP'd instructions that write the flag, but where that flag value isn't used, and yet the instruction either writes the accumulator or has side effects. I don't believe any such instructions exist, so this is mostly a code cleanup. Curro pointed out that FS_OPCODE_FB_WRITE has a null destination and actually writes the flag on Gen4-5 to dynamically decide whether to write some payload data. The hunk removed in this patch might have NOP'd it, except that we don't actually mark flags_written() in the IR, so it doesn't think the flag is touched at all. That's sketchy, but it means it wouldn't hit this today (though there are likely other problems!). v2: Properly replace the inst->regs_written() check in the second hunk with the flag being live (mistake caught by Curro). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Make DCE explicitly not eliminate any control flow instructions.Kenneth Graunke2017-01-171-3/+2
| | | | | | | | | | | | | | | | | | | According to Matt, the dead code pass explicitly avoided IF and WHILE because on Sandybridge, these could have conditional modifiers and null destination registers. Normally, those instructions use BAD_FILE for the destination register. Nowadays, we don't do that anymore, so we could technically drop these checks. However, it's clearer to explicitly leave control flow instructions alone, so change it to the more generic !inst->is_control_flow(). This should have no actual change. [This patch implements review feedback from Curro and Matt.] Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> Reviewed-by: Matt Turner <[email protected]>