aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri
Commit message (Collapse)AuthorAgeFilesLines
...
* i965: Enable 64-bit integer support for almost all unary and binary operationsIan Romanick2017-01-201-10/+0
| | | | | | | | Integer comparison functions (e.g., nir_op_ilt) are handled in the next commit. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Enable uploading 64-bit integer uniformsIan Romanick2017-01-201-1/+3
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Add 64-bit integer support for conversions and bitcastsIan Romanick2017-01-202-5/+35
| | | | | | | | | | v2 (idr): Make the "from" type in a cast unsized. This reduces the number of required cast operations at the expensive slightly more complex code. However, this will be a dramatic improvement when other sized integer types are added. Suggested by Connor. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Enable emitting Q and UQ instructions in the fs backendIan Romanick2017-01-202-1/+12
| | | | | | | | v2: Fixup assertion in brw_reg_type_to_hw_type to allow BRW_REGISTER_TYPE_{UQ,Q} on Gen8+. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Add support for constant evaluation on Q and UQ typesIan Romanick2017-01-202-7/+20
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Return Q and UQ types for int64 and uint64Ian Romanick2017-01-201-4/+2
| | | | | | | | | It seems like maybe this should return a different type based on Gen. Q and UQ only exist on Gen8+, but, based on the old comment, I believe previous Gens can generate 64-bit moves. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Really don't emit Q or UQ moves on Gen < 8Ian Romanick2017-01-201-0/+8
| | | | | | | | | | | | | | | | | | | | | | It's much easier to do this in the generator rather than while coming out of NIR. brw_type_for_nir_type doesn't know the Gen, so we'd have to add a bunch of plumbing. The alternate fix is to not emit int64 moves for doubles in the first place... but that seems even more difficult. This change won't catch non-MOV instructions that try to use 64-bit integer types on Gen < 8. This may convert certain kinds of bugs in to different kinds of bugs that are more difficult to detect (since the assertions in the function won't catch them). NOTE: I don't think anything can emit mixed-type 64-bit moves until the same platform supports both ARB_gpu_shader_fp64 and ARB_gpu_shader_int64. When we enable int64 on Gen < 8, we can solve this problem other ways. This prevents regressions on HSW in the next patch. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Avoid int64 warnings.Dave Airlie2017-01-201-0/+28
| | | | | | | | | | | | | Just add operations to the switch statement here. v2 (idr): "cut them down later" => Remove ir_unop_b2u64 and ir_unop_u642b. Handle these with extra i2u or u2i casts just like uint(bool) and bool(uint) conversion is done. Signed-off-by: Dave Airlie <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Avoid int64 induced warningsDave Airlie2017-01-203-0/+6
| | | | | | | | | Just add types into unsupported or double equivalent spots. Signed-off-by: Dave Airlie <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Validate "Special Cases for Byte Operations"Matt Turner2017-01-202-9/+150
| | | | | Do this in general_restrictions_based_on_operand_types() because the two rules that "Special Cases for Byte Operations" relax are checked there.
* i965: Validate "Region Alignment Rules"Matt Turner2017-01-202-1/+697
|
* i965: Validate "General Restrictions Based on Operand Types"Matt Turner2017-01-202-0/+273
|
* i965: Validate "General Restrictions on Regioning Parameters"Matt Turner2017-01-202-0/+377
|
* i965: Replace reg_type_size[] with a function.Matt Turner2017-01-203-25/+75
| | | | | | A function is necessary to handle immediate types. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Validate math instruction sources.Matt Turner2017-01-202-9/+23
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Claim that SEND/math has two sources.Matt Turner2017-01-201-1/+8
| | | | | | | | src1 must be a descriptor (including the information to determine that the SEND is doing an extended math operation), but src0 can actually be null since it serves as the source of the implicit GRF -> MRF move. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Simplify num_sources_from_inst().Matt Turner2017-01-201-3/+1
| | | | | | | | desc will always be non-NULL, because brw_validate_instructions() does not attempt to validate any instructions that fail the is_unsupported_inst() check. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Factor out send_restrictions() function.Matt Turner2017-01-201-12/+22
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Factor out sources_not_null() validation function.Matt Turner2017-01-201-17/+23
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Structure code so unsupported inst will not generate more errors.Matt Turner2017-01-201-2/+5
| | | | | | | | We want to rely on brw_opcode_desc() always returning non-NULL in other validation functions. Other validation functions will be in the else case of the block added in this patch. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add a test for the EU assembly validator.Matt Turner2017-01-202-0/+176
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add a CHECK macro to call more complicated validation funcs.Matt Turner2017-01-201-0/+9
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Make ERROR_IF usable from other functions.Matt Turner2017-01-201-1/+1
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Mark error annotation on correct SIMD16 inst.Matt Turner2017-01-201-2/+2
| | | | | | | | | | | | inst, whose assignment can be seen in the last line of context pointed to the correct instruction in the SIMD16 program, but src_offset was the offset from the beginning of the SIMD16 program. So if an instruction at offset 0x100 in the SIMD16 program was illegal, we would mark an error on the instruction at offset 0x100 (which is likely in the SIMD8 program). Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: Use UW-typed operands when dest is UW.Matt Turner2017-01-201-4/+6
| | | | | | | | | | Using a UD-typed operand makes the execution size D, and if the size of the execution type is greater than the size of the destination type, the destination must be appropriately strided. We actually just want UW-types all around. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Use W-typed immediate in brw_F32TO16().Matt Turner2017-01-201-1/+1
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Don't change F->VF if dest type is DF.Matt Turner2017-01-201-1/+2
| | | | | | | | | We change the immediate source type to VF to allow instruction compaction, but there are no entires in the compaction table for DF, so there's no point in doing this. Additionally, I mixing floating-point types is now allowed except for F and VF.
* i965: Remove unnecessary mt->compressed checksAnuj Phogat2017-01-191-12/+4
| | | | | | | | It's harmless to use ALIGN_NPOT() for uncompressed formats because they have block width/height = 1. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Alejandro Piñeiro <[email protected]>
* i965: Fix indentation in brw_miptree_layout_2d()Anuj Phogat2017-01-191-3/+2
| | | | | Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Alejandro Piñeiro <[email protected]>
* i965: Fix comment to include 3d texturesAnuj Phogat2017-01-191-1/+2
| | | | | Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Alejandro Piñeiro <[email protected]>
* i965: Delete pending CCS and HiZ ops in intel_miptree_make_shareable()Chad Versace2017-01-191-0/+16
| | | | | | | | | | | | | | | | | | | Fixes crash in piglit `egl_khr_gl_renderbuffer_image-clear-shared-image GL_DEPTH_COMPONENT24` on Skylake. The crash happened because blorp attempted to execute a pending hiz clear after the hiz buffer was deleted. Deleting the pending hiz ops when the hiz buffer gets deleted fixes the crash. For good measure, this patch also deletes all pending CCS/MCS ops when the CCS/MCS buffer gets deleted. I'm now aware of any bugs caused by the dangling ops, but deleting them is clearly the right thing to do. Cc: Ben Widawsky <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99265
* mesa/glsl/i965: set and use tcs vertices_out directlyTimothy Arceri2017-01-191-4/+2
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* i965: get outputs_written from gl_programTimothy Arceri2017-01-191-2/+2
| | | | | | | There is no need to go via the pointer in nir_shader. This change is required for the shader cache as we don't create a nir_shader. Reviewed-by: Lionel Landwerlin <[email protected]>
* i965/blorp: Make post draw flush more explicitTopi Pohjolainen2017-01-182-5/+22
| | | | | | | | | | | | | Blits do not need any special treatment as the target buffer object is added to render cache just as one does for normal draw. Color clears and resolves in turn require explicit "end of pipe synchronization". It is not clear what this means exactly but the assumption is that render cache flush with command stream stall should be sufficient. Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/gen6: Issue direct depth stall and flush after depth clearTopi Pohjolainen2017-01-181-1/+6
| | | | | | | | | | | | | | | | | | instead of calling unconditionally brw_emit_mi_flush() which does: brw_emit_pipe_control_flush(brw, PIPE_CONTROL_DEPTH_CACHE_FLUSH | PIPE_CONTROL_RENDER_TARGET_FLUSH | PIPE_CONTROL_CS_STALL); brw_emit_pipe_control_flush(brw, PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE | PIPE_CONTROL_CONST_CACHE_INVALIDATE); Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Make depth clear flushing more explicitTopi Pohjolainen2017-01-182-8/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | Current blorp logic issues unconditional "flush everything" (see brw_emit_mi_flush()) after each render. For example, all blits issue this unconditionally which shouldn't be needed if they set render cache properly so that subsequent renders do necessary flushing before drawing. In case of piglit: ext_framebuffer_multisample-accuracy all_samples depth_draw small intel_hiz_exec() is always preceded by blorb blit and the unconditional flush looks to hide the lack of stall and flushes in depth clears. By removing the brw_emit_mi_flush() I get gpu hangs. This patch adds the stalls and flushes mandated by the spec and gets rid of those hangs. v2 (Jason, Ken): Document the rational for separating depth cache flush and stall on Gen7. Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/blorp: Use the render cache mechanism instead of explicit flushingTopi Pohjolainen2017-01-181-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | by replacing brw_emit_mi_flush() with brw_render_cache_set_check_flush(). The latter splits the flush in two: brw_emit_pipe_control_flush(brw, PIPE_CONTROL_DEPTH_CACHE_FLUSH | PIPE_CONTROL_RENDER_TARGET_FLUSH | PIPE_CONTROL_CS_STALL); brw_emit_pipe_control_flush(brw, PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE | PIPE_CONTROL_CONST_CACHE_INVALIDATE); instead of int flags = PIPE_CONTROL_NO_WRITE | PIPE_CONTROL_RENDER_TARGET_FLUSH; if (brw->gen >= 6) { flags |= PIPE_CONTROL_INSTRUCTION_INVALIDATE | PIPE_CONTROL_CONST_CACHE_INVALIDATE | PIPE_CONTROL_DEPTH_CACHE_FLUSH | PIPE_CONTROL_VF_CACHE_INVALIDATE | PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE | PIPE_CONTROL_CS_STALL; } brw_emit_pipe_control_flush(brw, flags); v2 (Jason): Check that destination exists before trying to add to render cache. Depth clears and resolves don't have it. Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Make brw_cache_item structure private to brw_program_cache.c.Kenneth Graunke2017-01-182-19/+21
| | | | | | | | struct brw_cache_item is an implementation detail of the program cache. We don't need to make those internals available to the entire driver. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]>
* i965: Fix SURFACE_STATE to handle non-zero aux offsetsBen Widawsky2017-01-181-2/+1
| | | | | | | Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Acked-by: Daniel Stone <[email protected]>
* i965: Don't map/unmap in brw_print_program_cache on LLC platforms.Kenneth Graunke2017-01-171-2/+4
| | | | | | | | | | | We have a persistent mapping. Don't map it a second time or try to unmap it. Just use the pointer. This most likely would wreak havoc except that this code is unused (it's only called from an if (0) debug block). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]>
* i965: Move program cache printing to brw_program_cache.c.Kenneth Graunke2017-01-173-57/+49
| | | | | | | | It makes sense to put a function which prints out the entire contents of the program cache in the file that implements the program cache. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]>
* i965: Make a helper for finding an existing shader variant.Kenneth Graunke2017-01-177-85/+68
| | | | | | | | | We had five copies of the same "walk the cache and look for an existing shader variant for this program" code. Now we have one helper function that returns the key. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]>
* i965: Make DCE set null destinations on messages with side effects.Kenneth Graunke2017-01-171-13/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (Co-authored by Matt Turner.) Image atomics, for example, return a value - but the shader may not want to use it. We assigned a useless VGRF destination. This seemed harmless, but it can actually be quite harmful. The register allocator has to assign that VGRF to a real register. It may assign the same actual GRF to the destination of an instruction that follows soon after. This results in a write-after-write (WAW) dependency, and stall. A number of "Deus Ex: Mankind Divided" shaders use image atomics, but don't use the return value. Several of these were hitting WAW stalls for nearly 14,000 (poorly estimated) cycles a pop. Making dead code elimination null out the destination avoids this issue. This patch cuts one shader's estimated cycles by -98.39%! Removing the message response should also help with data cluster bandwidth. On Skylake: (instruction counts remain identical) total cycles in shared programs: 255413890 -> 248081010 (-2.87%) cycles in affected programs: 12019948 -> 4687068 (-61.01%) helped: 24 HURT: 10 v2: Make can_omit_write independent of can_eliminate (Curro). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Combine some dead code elimination NOP'ing code.Kenneth Graunke2017-01-171-8/+1
| | | | | | | | | | | | | | | | | | | | | | | | In theory we might have incorrectly NOP'd instructions that write the flag, but where that flag value isn't used, and yet the instruction either writes the accumulator or has side effects. I don't believe any such instructions exist, so this is mostly a code cleanup. Curro pointed out that FS_OPCODE_FB_WRITE has a null destination and actually writes the flag on Gen4-5 to dynamically decide whether to write some payload data. The hunk removed in this patch might have NOP'd it, except that we don't actually mark flags_written() in the IR, so it doesn't think the flag is touched at all. That's sketchy, but it means it wouldn't hit this today (though there are likely other problems!). v2: Properly replace the inst->regs_written() check in the second hunk with the flag being live (mistake caught by Curro). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Make DCE explicitly not eliminate any control flow instructions.Kenneth Graunke2017-01-171-3/+2
| | | | | | | | | | | | | | | | | | | According to Matt, the dead code pass explicitly avoided IF and WHILE because on Sandybridge, these could have conditional modifiers and null destination registers. Normally, those instructions use BAD_FILE for the destination register. Nowadays, we don't do that anymore, so we could technically drop these checks. However, it's clearer to explicitly leave control flow instructions alone, so change it to the more generic !inst->is_control_flow(). This should have no actual change. [This patch implements review feedback from Curro and Matt.] Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Make BLORP disable the NP Z PMA stall fix.Kenneth Graunke2017-01-161-0/+4
| | | | | | | | This may fix GPU hangs on Gen8. I don't know if it does though. Cc: [email protected] Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Enable OpenGL 4.5 on Haswell.Kenneth Graunke2017-01-162-2/+2
| | | | | | | | Everything is in place and the test results look solid. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* i965: Use align1 mode for barrier messages.Kenneth Graunke2017-01-151-0/+3
| | | | | | | | | | | | | | In commit 7428e6f86ab5 we switched the barrier SEND message's destination type to UW to avoid problems in SIMD16 compute shaders. Tessellation control shaders also use barriers, and in vec4 mode, we were emitting them in align16 mode. The simulator warns that only UD, D, F, and DF are valid destination types - UW is technically illegal. So, switch to align1 mode. Either mode should work fine. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Move Gen4-5 interpolation stuff to brw_wm_prog_data.Kenneth Graunke2017-01-1311-70/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | This fixes glxgears rendering, which had surprisingly been broken since late October! Specifically, commit 91d61fbf7cb61a44adcaae51ee08ad0dd6b. glxgears uses glShadeModel(GL_FLAT) when drawing the main portion of the gears, then uses glShadeModel(GL_SMOOTH) for drawing the Gouraud-shaded inner portion of the gears. This results in the same fragment program having two different state-dependent interpolation maps: one where gl_Color is flat, and another where it's smooth. The problem is that there's only one gen4_fragment_program, so it can't store both. Each FS compile would trash the last one. But, the FS compiles are cached, so the first one would store FLAT, and the second would see a matching program in the cache and never bother to compile one with SMOOTH. (Clearing the program cache on every draw made it render correctly.) Instead, move it to brw_wm_prog_data, where we can keep a copy for every specialization of the program. The only downside is bloating the structure a bit, but we can tighten that up a bit if we need to. This also lets us kill gen4_fragment_program entirely! Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* i965/vec4: Fix mapping attributesJuan A. Suarez Romero2017-01-132-23/+11
| | | | | | | | | | | | | | | | | | This patch reverts 57bab6708f2bbc1ab8a3d202e9a467963596d462, which was causing issues with ILK and earlier VS programs. 1. brw_nir.c: Revert "i965/vec4/nir: vec4 also needs to remap vs attributes" Do not perform a remap in vec4 backend. Rather, do it later when setup attributes 2. brw_vec4.cpp: This fixes mapping ATTRx to proper GRFn. Suggested-by: Kenneth Graunke <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99391 [[email protected]: merge Juan's two patches from bugzilla] Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>