aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri
Commit message (Collapse)AuthorAgeFilesLines
* mesa: Refactor viewport transform computation.Mathias Fröhlich2014-10-241-17/+9
| | | | | | | | | | This is for preparation of ARB_clip_control. v3: Add comments. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Froehlich <[email protected]>
* i965: Silence unused variable warning.Matt Turner2014-10-231-2/+1
|
* i965/fs: Silence uninitialized variable warning.Matt Turner2014-10-231-0/+1
| | | | | | | The compiler isn't privy to the knowledge that we're doing at least one framebuffer write. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/vec4: Generate better code for ir_triop_csel.Kenneth Graunke2014-10-211-3/+15
| | | | | | | | | | | | | | | | | | | | | | | | Previously, we generated an extra CMP instruction: cmp.ge.f0(8) g6<1>D g1<0,4,1>F 0F cmp.nz.f0(8) null g6<4,4,1>D 0D (+f0) sel(8) g5<1>F g1.4<0,4,1>F g2<0,4,1>F The first operand is always a boolean, and we want to predicate the SEL on that. Rather than producing a boolean value and comparing it against zero, we can just produce a condition code in the flag register. Now we generate: cmp.ge.f0(8) null g1<0,4,1>F 0F (+f0) sel(8) g5<1>F g1.4<0,4,1>F g2<0,4,1>F No difference in shader-db. v2: Remember to delete the old code (thanks Matt). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Simplify visit(ir_expression *)'s result_src/dst setup.Kenneth Graunke2014-10-211-13/+6
| | | | | | | | | | | | | Using dst_reg(this, ir->type) automatically sets the writemask to the proper size for the type; src_reg(dst_reg) preserves that. This should be equivalent, but less code. Note that src_reg(dst_reg) either uses SWIZZLE_XXXX or SWIZZLE_XYZW, so the old code did need the manual writemask adjustment, since it constructed the registers the other way around. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Delete some dead code in visit(ir_expression *).Kenneth Graunke2014-10-211-8/+0
| | | | | | | | | | | Nothing uses the vector_elements temporary variable. Setting this->result.file is dead because we overwrite this->result a few lines later. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Generate better code for ir_triop_csel.Kenneth Graunke2014-10-211-5/+13
| | | | | | | | | | | | | | | | | | | | | | | | Previously, we generated an extra CMP instruction: cmp.ge.f0(8) g4<1>D g2<0,1,0>F 0F cmp.nz.f0(8) null g4<8,8,1>D 0D (+f0) sel(8) g120<1>F g2.4<0,1,0>F g3<0,1,0>F The first operand is always a boolean, and we want to predicate the SEL on that. Rather than producing a boolean value and comparing it against zero, we can just produce a condition code in the flag register. Now we generate: cmp.ge.f0(8) null g2<0,1,0>F 0F (+f0) sel(8) g124<1>F g2.4<0,1,0>F g3<0,1,0>F total instructions in shared programs: 5473459 -> 5473253 (-0.00%) instructions in affected programs: 6219 -> 6013 (-3.31%) Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Change the type of booleans to UD and emit correct immediatesJason Ekstrand2014-10-173-16/+16
| | | | | | | | | | | | | | | | | Before, we used the a signed d-word for booleans and the immedates we emitted varried between signed and unsigned. This commit changes the type to unsigned (I think that makes more sense) and makes immediates more consistent. This allows copy propagation to work better cleans up some instructions. total instructions in shared programs: 5473519 -> 5465864 (-0.14%) instructions in affected programs: 432849 -> 425194 (-1.77%) GAINED: 27 LOST: 0 Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Don't pass ir_variable * to emit_sampleid_setup().Kenneth Graunke2014-10-173-4/+4
| | | | | | | | | gl_SampleID is a built-in variable that always is of type "int". Suggested by Connor Abbott. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* mesa: Drop the "target" parameter from NewBufferObject().Kenneth Graunke2014-10-164-9/+8
| | | | | | | | | | | NewBufferObject took a "target" parameter, which it blindly passed to _mesa_initialize_buffer_object(), which ignored it. Not much point in passing it around. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Flag BRW_ATOMIC_COUNTER_BUFFER when a possible ABO is respecifiedChris Forbes2014-10-161-0/+2
| | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965/disasm: Add missing message type for Gen7 DP untyped surface readChris Forbes2014-10-161-0/+1
| | | | | | | | This is used to implement GLSL's atomicCounter() intrinsic. Previously it *worked*, but the disassembly was bogus. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965: Correctly use ABO count to trigger flagging of new surfaces.Chris Forbes2014-10-161-1/+1
| | | | | | | | This would have *almost never* actually been an issue, since other state tends to get flagged at the same time as new ABOs -- but still bogus. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965: No longer reemit textures on BRW_NEW_UNIFORM_BUFFERChris Forbes2014-10-161-2/+1
| | | | | | | | This didn't make any sense, but papered over the missing TexBO flagging we've just fixed, in a bunch of cases. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Dirty state in BO reallocation based on usage historyChris Forbes2014-10-161-1/+4
| | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Have mesa flag BRW_NEW_TEXTURE_BUFFER when a TexBO binding changesChris Forbes2014-10-161-0/+1
| | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Add new dirty flag for new TexBOs.Chris Forbes2014-10-163-0/+4
| | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965/fs: don't make a fake ir_texture in the Mesa IR frontendConnor Abbott2014-10-151-14/+5
| | | | | | | | | Now that we've made all the texture emit code mostly independent of GLSL IR, this isn't necessary any more. Signed-off-by: Connor Abbott <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/fs: Refactor the texture emission logic into a single function.Kenneth Graunke2014-10-153-104/+144
| | | | | | | | | | | | | | | | | Before, we had 3 different emit functions for various different gen's, as well as some ancilliary work that was the same across all gen's which was either contained in functions or duplicated across the GLSL IR and Mesa IR backends. Now, we have a single method, emit_texture(), that takes all the information needed to make a texture instruction and handles all the setup, and all we have to do to emit a texture instruction while converting from GLSL IR, Mesa IR, or any new backend is to extract the information emit_texture() needs and then call it. v2: Significant rebasing (by Ken). Signed-off-by: Connor Abbott <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/fs: Make gather_channel() not use ir_texture.Connor Abbott2014-10-152-5/+4
| | | | | | | | Our new IR won't have ir_texture objects. Signed-off-by: Connor Abbott <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/fs: Make swizzle_result() not use ir_texture.Connor Abbott2014-10-153-8/+9
| | | | | | | | Our new IR won't have ir_texture objects. Signed-off-by: Connor Abbott <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/fs: fix integer textures with swizzlesConnor Abbott2014-10-151-0/+1
| | | | | | | | | This happened to work before, but it would convert the output to a float and then back to an integer which seems bad. Signed-off-by: Connor Abbott <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/fs: don't pass in ir_texture to emit_texture_*Connor Abbott2014-10-153-24/+23
| | | | | | | | At this point, the only thing it's used for is the opcode. Signed-off-by: Connor Abbott <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/fs: don't use ir->type in emit_texture_gen4()Connor Abbott2014-10-151-4/+1
| | | | | | | | We already have the type from the original destination. Signed-off-by: Connor Abbott <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/fs: Don't use ir->lod_info.grad.dPd<x,y> in emit_texture_*.Connor Abbott2014-10-153-18/+31
| | | | | | | | | | | | | This drops a dependency on ir_texture objects. v2 (Ken): Rename lod_components to grad_components, as it only has a meaningful value for ir_txd. We could set it to 1 for TXL, but there's no real need. Signed-off-by: Connor Abbott <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/fs: Don't use ir->coordinate in emit_texture_*.Connor Abbott2014-10-153-31/+39
| | | | | | | | This drops a dependency on ir_texture objects. Signed-off-by: Connor Abbott <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/fs: make rescale_texcoord() not use ir_texture.Connor Abbott2014-10-153-8/+8
| | | | | | | | | | Our new IR won't have ir_texture objects, but using glsl_type is fine. v2 (Ken): Drop redundant ir->coordinate NULL check; rebase. Signed-off-by: Connor Abbott <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/fs: Make emit_mcs_fetch() not use ir_texture.Connor Abbott2014-10-152-4/+4
| | | | | | | Our new IR won't have ir_texture objects. Signed-off-by: Connor Abbott <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/fs: Rename "length" to "components" in emit_mcs_fetch().Kenneth Graunke2014-10-151-6/+6
| | | | | | | This is slightly clearer. Based on a patch by Connor Abbott. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Make brw_texture_offset() not use ir_texture.Connor Abbott2014-10-154-12/+15
| | | | | | | | Our new IR won't have ir_texture objects. Signed-off-by: Connor Abbott <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/fs: don't use ir->offset in emit_texture_gen5.Connor Abbott2014-10-153-5/+8
| | | | | | | | v2 (Ken): Refactor the Gen7 code separately; rebase. Signed-off-by: Connor Abbott <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/fs: Move texel offset handling to visit(ir_texture *).Kenneth Graunke2014-10-153-11/+29
| | | | | | | | | | | | This moves the handling of non-constant texel offset subexpression trees to the place where we visit other such subtrees. It also removes some uses of ir->offset in emit_texture_gen7, which will be useful when we write the backend for our new upcoming IR. Based on a patch by Connor Abbott. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Drop ir->op != ir_txf condition in offset checking.Kenneth Graunke2014-10-152-4/+3
| | | | | | | | | brw_lower_unnormalized_offset sets ir->offset to NULL if it applies the texelFetchOffset workarounds, so there's no need to special case it here---there won't be an offset for ir_txf. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Restore a lost comment about TXF offset bugs.Kenneth Graunke2014-10-151-0/+5
| | | | | | | | | | | Eric's original code to work around TXF offset bugs contained a comment explaining the problem, which was lost when Chris generalized it to an IR transformation (in commit 598ca510b8a118c3c7e18b5d031a2b116120e0a6). This commit adds the original comment to the newer code. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Allow CSE on Gen4-5 unary math.Kenneth Graunke2014-10-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | Due to the implicit move-from-GRF, unary math looks a lot like the Gen6+ math instruction: it's a single instruction (SEND) with a GRF source. The difference is that it also implicitly clobbers a message register. The only visible effect is that CSE will remove the MRF-clobbering from later math operations. This should be fine; compute_to_mrf and remove_redundant_mrf_writes don't look at the values populated by implied writes, so they can't rely on those values being present. Less interference may actually help those passes make more progress. Binary math is still problematic, since it involves a separate MOV instruction to load the second operand. We continue disallowing CSE for binary math operations. total instructions in shared programs: 3340303 -> 3340100 (-0.01%) instructions in affected programs: 26927 -> 26724 (-0.75%) Nothing hurt, gained, or lost. ~6% reduction on a few shaders. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Use the correct regs_written on unspill instructionsJason Ekstrand2014-10-141-0/+1
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nouveau: 3d textures are unsupported, limit 3d levels to 1Ilia Mirkin2014-10-141-0/+3
| | | | | | | | | | Ideally there would be a swrast fallback, but the driver isn't ready for that. This should avoid crashes if someone tries to use 3d textures though. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> Cc: [email protected]
* i965: Use unsynchronized maps for the program cache on LLC platforms.Kenneth Graunke2014-10-131-7/+28
| | | | | | | | | | | | | | | | | | | | There's no reason to stall on pwrite - the CPU always appends to the buffer and never modifies existing contents, and the GPU never writes it. Further, the CPU always appends new data before submitting a batch that requires it. This code predates the unsynchronized mapping feature, so we simply didn't have the option when it was written. Ideally, we would do this for non-LLC platforms too, but unsynchronized mapping support only exists for LLC systems. Saves a bunch of stall avoidance copies when uploading shaders. v2: Rebase on changes to previous patch. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> [v1]
* i965: Issue performance warnings when copying the program cache BO.Kenneth Graunke2014-10-131-0/+3
| | | | | | | | | | | | | We don't really want unnecessary buffer copying, so it'd be nice to know when it's happening. v2: Drop stall warnings when doing a read-only CPU mapping of the cache BO. The GPU also uses it in a read-only fashion, so there won't be any stalls, even though the buffer is busy. (Thanks to Chris Wilson for catching this mistake.) Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> [v1]
* i965: Issue performance warnings on MapBufferRange stalls.Kenneth Graunke2014-10-131-3/+4
| | | | | | | | This is easy: we just need to use brw_map_bo instead of mapping it directly. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965: Fix register write checks.Kenneth Graunke2014-10-101-0/+2
| | | | | | | | | | | | | When mapping the buffer a second time, we need to use the new pointer, not the one from the previous mapping. Otherwise, we will most likely crash. Apparently, we've just been getting lucky and getting the same bo->virtual pointer in both cases. libdrm probably has a hand in that. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Cc: [email protected]
* i965: Skip uploading border color when unnecessary.Kenneth Graunke2014-10-091-2/+20
| | | | | | | | | | | | | | The border color is only needed when using the GL_CLAMP_TO_BORDER or (deprecated) GL_CLAMP wrap modes; all others ignore it, including the common GL_CLAMP_TO_EDGE and GL_REPEAT wrap modes. In those cases, we can skip uploading it entirely, saving a bit of space in the batchbuffer. Instead, we just point it at the start of the batch (offset 0); we have to program something, and that address is safe to read. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Use BDW_MOCS_PTE for renderbuffers.Kenneth Graunke2014-10-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Write-back caching cannot be used for buffers being scanned out by the display engine; surfaces used for scan-out must be write-through or uncached. I originally chose WT for render targets because it works in all cases. However, we really want to use write-back caching where possible, as it is more efficient. Most renderbuffers are not used for scanout - off-screen FBOs certainly are fine, and non-pageflipped backbuffers should be fine as well. So in most cases WB will work. However, we don't know what will be used for scan-out, so we instead simply use the PTE value specified by the kernel, as it knows these things. This matches our MOCS choice on Haswell. Fixes performance regressions since commit ee4484be3dc827cf15bcf109f5 in a microbenchmark (spotted by Eero Tamminen). Improves performance in GLBenchmark 2.7/EgyptHD by 7.44362% +/- 0.496939% (n=55) on a Broadwell GT2. Improves performance in a bunch of other microbenchmarks by ~15% or so. Signed-off-by: Kenneth Graunke <[email protected]> Reported-by: Eero Tamminen <[email protected]> Reviewed-by: Daniel Vetter <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Cc: [email protected]
* i965: Add a BRW_MOCS_PTE #define.Kenneth Graunke2014-10-091-3/+7
| | | | | | | | | | | | | | Like BDW_MOCS_WB and BDW_MOCS_WT, this specifies that we want to use all three caches (L3, LLC, and eLLC where available), but leaves the LLC caching mode up to the kernel's page table entry. This allows the kernel to pick WB/WT/UC based on whether it's using a buffer for scanout. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Daniel Vetter <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Cc: [email protected]
* i965/compaction: Disable compaction on SNB temporarily.Matt Turner2014-10-031-0/+6
| | | | Will investigate after XDC.
* Revert "i965: Emit ELSE/ENDIF JIP with type D on Gen 7."Matt Turner2014-10-031-2/+2
| | | | | | | | This reverts commit 54e30dbf4db437748509d1319c3f6e4185f76c69. Will investigate after XDC. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84557
* i965/fs: Remove dead generate_rep_fb_write prototype.Matt Turner2014-10-031-1/+0
| | | | Added in commit f9dc7aab.
* i965/fs: Use the correct base_mrf for spilling pairs in SIMD8Jason Ekstrand2014-10-021-3/+4
| | | | | | | | | | Before, we were hard-coding the base_mrf based on dispatch width not number of registers spilled at a time. This caused us to emit instructions with a base_mrf or 14 and a mlen of 3 so we used the magical non-existant m16 register. This fixes the problem. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Add a MAX_GRF_SIZE define and use it various placesJason Ekstrand2014-10-024-6/+9
| | | | | | | | | | | | Previously, we had a MAX_SAMPLER_MESSAGE_SIZE which we used instead. However, some FB write messages can validly be longer than this so we need something different. Since MAX_SAMPLER_MESSAGE_SIZE is validly useful on its own, we leave it alone and add a new MAX_GRF_SIZE that's big enough for FB writes. Signed-off-by: Jason Ekstrand <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84539 Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Use the actual regsister width in brw_reg_from_fs_regJason Ekstrand2014-10-021-0/+13
| | | | | | | | This fixes a bug where 1-wide operations don't properly translate down to 1-wide instructions. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>