summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers
Commit message (Collapse)AuthorAgeFilesLines
* Eliminate several cases of multiplication in arguments to callocCarl Worth2014-09-032-3/+3
| | | | | | | | | | | | | | | | | | | | | | In commit 32f2fd1c5d6088692551c80352b7d6fa35b0cd09, several calls to _mesa_calloc(x) were replaced with calls to calloc(1, x). This is strictly equivalent to what the code was doing previously. But for cases where "x" involves multiplication, now that we are explicitly using the two-argument calloc, we can do one step better and replace: calloc(1, A * B); with: calloc(A, B); The advantage of the latter is that calloc will detect any overflow that would have resulted from the multiplication and will fail the allocation, (whereas the former would return a small allocation). So this fix can change potentially exploitable buffer overruns into segmentation faults. Reviewed-by: Matt Turner <[email protected]>
* i965: Handle ir_triop_csel in emit_bool_to_cond_code().Kenneth Graunke2014-09-032-4/+36
| | | | | | | | | | | ir_triop_csel can return a boolean expression, so we need to handle it here; we simply forgot when we added it. Fixes Piglit's EXT_shader_integer_mix/{vs,fs}-mix-if-bool. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: [email protected]
* i965: Move curb_read_length/total_scratch to brw_stage_prog_data.Kenneth Graunke2014-09-0316-38/+40
| | | | | | | | All shader stages have these fields, so it makes sense to store them in the common base structure, rather than duplicating them in each. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/copy_image: Divide the x offsets by block width when using the blitterJason Ekstrand2014-09-031-10/+21
| | | | | | | | Signed-off-by: Jason Ekstrand <[email protected]> Cc: "10.3" <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82804 Tested-by: Tapani Pälli <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/copy_image: Use the correct block dimensionJason Ekstrand2014-09-031-6/+6
| | | | | | | | Signed-off-by: Jason Ekstrand <[email protected]> Cc: "10.3" <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82804 Tested-by: Tapani Pälli <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* meta/copy_image: Use the correct texture level when creating viewsJason Ekstrand2014-09-031-1/+1
| | | | | | | | | | | | | | Previously, we were accidentally assuming that the level of both textures was 0. Now we actually use the correct level in our hacked texture view. This doesn't 100% fix the meta path because the texture type is getting lost somewhere in the pipeline. However, it actually copies to/from the correct layer now. Signed-off-by: Jason Ekstrand <[email protected]> Cc: "10.3" <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82804 Tested-by: Tapani Pälli <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/copy_image: Use the correct texture levelJason Ekstrand2014-09-031-4/+6
| | | | | | | | | | | | Previously, we were using the source images level for both source and destination. Also, we weren't taking the MinLevel from a potential texture view into account. This commit fixes both problems. Signed-off-by: Jason Ekstrand <[email protected]> Cc: "10.3" <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82804 Tested-by: Tapani Pälli <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* meta: Make MESA_META_DRAW_BUFFERS restore properlyKristian Høgsberg2014-09-021-34/+4
| | | | | | | | | | | | | | | | | | | | A meta begin/end pair with MESA_META_DRAW_BUFFERS will change visible GL state. We recreate the draw buffer enums from the buffer bitfield, which changes GL_BACK to GL_BACK_LEFT (and GL_FRONT to GL_FRONT_LEFT). This commit modifes the save/restore logic to instead copy the buffer enums from the gl_framebuffer and then set them on restore using _mesa_drawbuffers(). It's not clear how this breaks the benchmark in 82796, but fixing meta to not leak the state change fixes the regression. No piglit regressions. Reviewed-by: Kenneth Graunke <[email protected]> Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=82796 Signed-off-by: Kristian Høgsberg <[email protected]> Cc: [email protected]
* mesa: Convert NewDriverState to 64-bitsJordan Justen2014-09-014-4/+16
| | | | | | | i965 will have more than 32 bits when BRW_STATE_COMPUTE_PROGRAM is added. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* i965: Modify state upload to allow 2 different sets of state atoms.Paul Berry2014-09-012-25/+32
| | | | | | | The set of state atoms for compute shaders is currently empty; it will be filled in by future patches. Reviewed-by: Jordan Justen <[email protected]>
* i965: Modify dirty bit handling to support 2 pipelines.Paul Berry2014-09-014-16/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The hardware state for compute shaders is almost entirely orthogonal to the hardware state for 3D rendering. To avoid sending unnecessary state to the hardware, we'll need to have a separate set of state atoms for the compute pipeline and the 3D pipeline. That means we need to maintain two separate sets of dirty bits to determine which state atoms need to be run. But the dirty bits are not completely independent; for example, if BRW_NEW_SURFACES is flagged while doing 3D rendering, then not only do we need to re-run 3D state atoms that depend on BRW_NEW_SURFACES, but we also need to re-run compute state atoms that depend on BRW_NEW_SURFACES. But we'll also need to re-run those state atoms the next time the compute pipeline is run. To accomplish this, we record two sets of dirty bits, one for each pipeline. When bits are dirtied (via SET_DIRTY_BIT() or SET_DIRTY_ALL()) we set them to the dirty state in both pipelines. When brw_state_upload() is run, we clear the dirty bits just for the pipeline that was run. Note that since the number of pipelines is known at compile time to be 2, the compiler should unroll the loops in SET_DIRTY_BIT() and SET_DIRTY_ALL(). Reviewed-by: Jordan Justen <[email protected]>
* i965: Create a macro for checking a dirty bit.Paul Berry2014-09-012-1/+7
| | | | | | | This will make it easier to extend dirty bit handling to support compute shaders. Reviewed-by: Jordan Justen <[email protected]>
* i965: Create a macro for setting all dirty bits.Paul Berry2014-09-014-7/+18
| | | | | | | This will make it easier to extend dirty bit handling to support compute shaders. Reviewed-by: Jordan Justen <[email protected]>
* i965: Create a macro for setting a dirty bit.Paul Berry2014-09-0132-67/+74
| | | | | | | This will make it easier to extend dirty bit handling to support compute shaders. Reviewed-by: Jordan Justen <[email protected]>
* i965: add missing parens in vec4 visitorDave Airlie2014-09-021-1/+2
| | | | | | | | | coverity reported this, Matt said it look like missing parens, not bad identing, so lets try that. Cc: "10.2 10.3" <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* i965/fs: Don't segfault when debug-logging a null programJason Ekstrand2014-09-011-2/+2
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: Don't segfault when debug-logging a null programJason Ekstrand2014-09-011-2/+2
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: don't use ir->shadow_comparitor in emit_texture_*Connor Abbott2014-09-012-7/+5
| | | | | Signed-off-by: Connor Abbott <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: don't pass ir_variable * to emit_samplepos_setup()Connor Abbott2014-09-013-5/+4
| | | | | | | | We were only using it to get at its type, which we already know because it's a builtin variable. Signed-off-by: Connor Abbott <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: don't pass ir_variable * to emit_frontfacing_interpolation()Connor Abbott2014-09-014-6/+6
| | | | | | | | | | We were only using it to get at its type, which we already know because it's a builtin variable. v2 (Ken): Rebase on Matt's optimized gl_FrontFacing calculations. Signed-off-by: Connor Abbott <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Fix GPU hangs when INTEL_DEBUG=no16 is set.Kenneth Graunke2014-08-311-1/+2
| | | | | | | | The replicated data clear shader needs to be SIMD16, or else the GPU will hang. So, compile it even if INTEL_DEBUG=no16 is set. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Remove try_emit_saturateAbdiel Janulgue2014-08-312-22/+0
| | | | | | | | | Now that saturate is implemented natively as an instruction, we can cut down on unneeded functionality. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Signed-off-by: Abdiel Janulgue <[email protected]>
* i965/fs: Refactor try_emit_saturateAbdiel Janulgue2014-08-311-15/+8
| | | | | | | | | | | | v3: Since the fs backend can emit saturate as a separate instruction, there is no need to detect for min/max instructions and to rewrite the instruction tree accordingly. On the other hand, we don't need to emit a separate saturated mov either when the expression generating src can do saturate directly. v4: Add can_do_saturate() check before enabling saturate modifer (Ken) Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Signed-off-by: Abdiel Janulgue <[email protected]>
* i965/vec4: Allow propagation of instructions with saturate flag to selAbdiel Janulgue2014-08-311-27/+58
| | | | | | | | | | | | | | | | | When sel conditon is bounded within 0 and 1.0. This allows code as: mov.sat a b sel.ge dst a 0.25F To be propagated as: sel.ge.sat dst b 0.25F v3: - Syntax clarifications in inst->saturate assignment - Remove extra parenthesis when assigning src_reg value from copy_entry (Matt Turner) v4: - Take channels into consideration when propagating saturated instructions. Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Abdiel Janulgue <[email protected]>
* i965/fs: Allow propagation of instructions with saturate flag to selAbdiel Janulgue2014-08-311-1/+17
| | | | | | | | | | | | | | When sel conditon is bounded within 0 and 1.0. This allows code as: mov.sat a b sel.ge dst a 0.25F To be propagated as: sel.ge.sat dst b 0.25F v3: Syntax clarifications in inst->saturate assignment (Matt Turner) Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Abdiel Janulgue <[email protected]>
* yi965/vec4: Add support for ir_unop_saturateAbdiel Janulgue2014-08-311-0/+4
| | | | | Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Abdiel Janulgue <[email protected]>
* i965/fs: Add support for ir_unop_saturateAbdiel Janulgue2014-08-312-0/+5
| | | | | Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Abdiel Janulgue <[email protected]>
* i965/vec4/fs: Count loops in shader debugAbdiel Janulgue2014-08-312-4/+8
| | | | | Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Abdiel Janulgue <[email protected]>
* i965/vec4: inline generate_vec4_instruction() within generate_code()Abdiel Janulgue2014-08-312-316/+296
| | | | | | | | | | Suggested by Matt. This patch combines and moves back the code-generation functions from generate_vec4_instruction() into generate_code(). Makes generate_code() a bit larger, but helps us to count loops in a straightforward manner. Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Abdiel Janulgue <[email protected]>
* i965: Add 2x MSAA support to Broadwell fast clear code.Kenneth Graunke2014-08-311-0/+1
| | | | | | | | | | | | According to the cited documentation section (but in the newer docs), x_scaledown is the same for 2x and 4x MSAA. +47 piglits. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83081 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: "10.3" <[email protected]>
* i965/vec4: Update register coalescing test.Matt Turner2014-08-301-4/+1
| | | | | | | In commit 04895f5c I added support for reswizzling writemasks. This test was checking that we didn't support this. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82881
* i965: Use unreachable() to silence warning.Matt Turner2014-08-301-2/+1
| | | | | | | | brw_meta_fast_clear.c:211:17: warning: 'x_scaledown' may be used uninitialized in this function [-Wmaybe-uninitialized] unsigned int x_scaledown, y_scaledown; Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: Eliminate ir_variable::data.atomic.buffer_indexIan Romanick2014-08-292-2/+2
| | | | | | | | | | | | | | | | | | | Just use ir_variable::data.binding... because that's the where the binding is stored for everything else that can use layout(binding=). Valgrind massif results for a trimmed apitrace of dota2: n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B) Before (32-bit): 50 40,564,927,443 69,185,408 63,683,871 5,501,537 0 After (32-bit): 74 40,580,119,657 69,186,544 63,506,327 5,680,217 0 Before (64-bit): 59 36,822,048,449 96,526,888 89,113,000 7,413,888 0 After (64-bit): 89 36,822,971,897 96,526,616 88,735,296 7,791,320 0 A real savings of 173KiB on 32-bit and 368KiB on 64-bit. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Mark BRW_CONDITIONAL_R as Gen <= 5.Matt Turner2014-08-281-1/+1
|
* i965/disasm: Show jump count for if/iff/halt.Matt Turner2014-08-281-1/+1
| | | | | | These instructions don't have pop count. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/disasm: Disassemble JMPI's source properly.Matt Turner2014-08-281-1/+2
| | | | | | | The source can be a register as well as an immediate, and disassembling a register as an immediate can have some strange results. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/disasm: Add break/cont/halt to list of has_uip().Matt Turner2014-08-281-1/+4
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/disasm: Disassemble Z/NZ conditional modifiers as .z/.nz.Matt Turner2014-08-281-2/+2
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Disable try_emit_b2f_of_compare on Gen4-6.Kenneth Graunke2014-08-221-0/+7
| | | | | | | | | | | | | | | | | | | | | The optimization relies on CMP setting the destination to 0, which is equivalent to 0.0f. However, early platforms only set the least significant byte, leaving the other bits undefined. So, we must disable the optimization on those platforms. Oddly, Sandybridge wasn't reported as broken. The PRM states that it only sets the LSB, but the internal documentation says that it follows the IVB behavior. Since it wasn't reported as broken, we believe it really does follow the IVB behavior. v2: Allow the optimization on Sandybridge (requested by Matt). +32 piglits on Ironlake. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?=79963 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Preserve CFG in predicated break pass.Matt Turner2014-08-221-4/+25
| | | | | | | | | | | | | | | | | Operating on this code, B0: ... cmp.ne.f0(8) (+f0) if(8) B1: break(8) B2: endif(8) We can delete B2 without attempting to merge any blocks, since the break/continue instruction necessarily ends the previous block. After deleting the if instruction, we attempt to merge blocks B0 and B1. Reviewed-by: Topi Pohjolainen <[email protected]>
* i965/fs: Rename variable in predicated break pass.Matt Turner2014-08-221-7/+8
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965/fs: Preserve CFG in the SEL peephole.Matt Turner2014-08-221-6/+9
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Preserve CFG when deleting dead control flow.Matt Turner2014-08-221-9/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This pass deletes an IF/ELSE/ENDIF or IF/ENDIF sequence, or the ELSE in an ELSE/ENDIF sequence. In the typical case (where IF and ENDIF) aren't the only instructions in their basic blocks, we can simply remove the instructions (implicitly deleting the block containing only the ELSE), and attempt to merge blocks B0 and B2 together. B0: ... (+f0) if(8) B1: else(8) B2: endif(8) ... If the IF or ENDIF instructions are the only instructions in their respective basic blocks (which are deleted by the removal of the instructions), we'll want to instead merge the next blocks. Both B0 and B2 are possibly removed by the removal of if & endif. Same situation for if/endif. E.g., in the following example we'd remove blocks B1 and B2, and then attempt to combine B0 and B3. B0: ... B1: (+f0) if(8) B2: endif(8) B3: ... Reviewed-by: Topi Pohjolainen <[email protected]>
* i965/cfg: Add functions to combine basic blocks.Matt Turner2014-08-222-0/+54
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965/cfg: Point to bblock_t containing associated control flowMatt Turner2014-08-223-27/+15
| | | | | | | | | | | | | | | | | | | | | ... rather than pointing directly to the associated instruction. This will let us set the block containing the IF statement's else-pointer to NULL, when we delete a useless ELSE instruction, as in the case (+f0) if(8) ... else(8) endif(8) Also, remove the pointer to the ENDIF, since it's unused, and it was also potentially wrong, in the case of a basic block containing both an ENDIF and an IF instruction: endif(8) cmp.ne.f0(8) ... (+f0) if(8) Reviewed-by: Topi Pohjolainen <[email protected]>
* i965/fs: Preserve CFG in register allocation.Matt Turner2014-08-222-10/+14
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Use basic-block aware insertion/removal functions.Matt Turner2014-08-229-40/+50
| | | | | | | | | To avoid invalidating and recreating the control flow graph. Also stop invalidating the CFG in places we didn't add or remove an instruction. cfg calculations: 202951 -> 80307 (-60.43%) Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Add invalidate_cfg parameter to invalidate_live_intervals().Matt Turner2014-08-225-7/+9
| | | | | | | Will let us avoid invalidating the CFG if the optimization pass has removed instructions using the new basic block methods. Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Add basic-block aware backend_instruction::insert_* methods.Matt Turner2014-08-222-0/+52
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Add a basic-block aware backend_instruction::remove method.Matt Turner2014-08-222-0/+50
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>