summaryrefslogtreecommitdiffstats
path: root/src/mesa
Commit message (Collapse)AuthorAgeFilesLines
* mesa: Avoid flagging _NEW_VIEWPORT on redundant viewport updates.Kenneth Graunke2014-10-011-0/+6
| | | | | | | | | Cuts the number of i965 color calculator viewport uploads by 100x (11017983 -> 113385) in 'x11perf -gc' with Glamor in Xephyr. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965: Drop CACHE_NEW_VS_PROG from the gen7_sf_state atom.Kenneth Graunke2014-10-011-1/+1
| | | | | | | | | I believe when I wrote this code, gen6_sf_state used CACHE_NEW_VS_PROG, which has since been replaced by BRW_NEW_VUE_MAP_GEOM_OUT. It's not needed here anyway - only SBE needs it. Just a copy and paste mistake. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965: Drop brwBindProgram driver hook.Kenneth Graunke2014-10-011-20/+0
| | | | | | | | | | | | | | | | This function flagged BRW_NEW_*_PROGRAM When ctx->{Vertex,Geometry,Fragment}Program._Current changes, core Mesa calls the BindProgram driver hook, which flagged BRW_NEW_*_PROGRAM. However, brw_upload_state also checks for that changing, sets the same flags, and also updates brw->fragment_program and so on. So, this looks to be entirely redundant. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
* i965: Add missing /* BRW_NEW_FRAGMENT_PROGRAM */ comments.Kenneth Graunke2014-10-013-6/+7
| | | | | | | | | I had to dig a bit to figure out why this was necessary. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
* i965: Use "1ull" instead of "1" in BRW_NEW_* defines.Kenneth Graunke2014-10-011-32/+32
| | | | | | | | | | Now that the bitfield is a uint64_t, we should use 1ull. Currently, we only have 32 entries, so 1 works fine, but it's not future-proof. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
* i965: Use ~0ull when flagging all BRW_NEW_* dirty flags.Kenneth Graunke2014-10-013-4/+4
| | | | | | | | | ~0 is 0xFFFFFFFF, which only covers the first 32 bits. We need all 64. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
* i965: Fix INTEL_DEBUG=state to work with 64-bit dirty bits.Kenneth Graunke2014-10-011-16/+7
| | | | | | | | | | | This will keep INTEL_DEBUG=state working when we add BRW_NEW_* bits beyond 1 << 31. We missed doing this when widening the driver flags from uint32_t to uint64_t. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
* i965: Delete CACHE_NEW_BLORP_CONST_COLOR_PROG.Kenneth Graunke2014-10-012-3/+0
| | | | | | | | | Unused since krh rewrote fast clears to use meta. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
* i965: Fix typo in commentChris Forbes2014-10-011-1/+1
| | | | Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
* i965: Fix spelling of GEN7_SAMPLER_EWA_ANISOTROPIC_ALGORITHMChris Forbes2014-10-012-2/+2
| | | | Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
* i965/fs: Fix the buildJason Ekstrand2014-09-301-1/+1
|
* i965/fs: Fix an uninitialized value warningsJason Ekstrand2014-09-301-3/+4
| | | | | Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Emit compressed BFI2 instructions on Gen > 7.Matt Turner2014-09-301-1/+1
| | | | | | | IVB had a restriction that prevented us from emitting compressed three-source instructions, and although that was lifted on Haswell, Haswell had a new restriction that said BFI instructions specifically couldn't be compressed.
* i965/fs: Allow SIMD16 borrow/carry/64-bit multiply on Gen > 7.Matt Turner2014-09-301-3/+3
| | | | | | | These checks were intended for Gen 7 only. None of these restrictions apply to Gen 8. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965/fs: Set MUL source type to W/UW in 64-bit mul macro on Gen8.Matt Turner2014-09-301-1/+22
| | | | Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965/fs: Optimize sqrt+inv into rsq.Matt Turner2014-09-301-0/+11
| | | | | | | | | | | | | | | | | | Transform sqrt a, b rcp c, a into sqrt a, b rsq c, b The improvement here is that we've broken a dependency between these instructions. Leads to 330 fewer INV instructions and 330 more RSQ. Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965/vec4: Optimize sqrt+inv into rsq.Matt Turner2014-09-301-0/+11
| | | | | | | | | | | | | | | | | | | | | | | Transform sqrt a, b rcp c, a into sqrt a, b rsq c, b In most cases the sqrt's result is still used, so the improvement here is that we've broken a dependency between these instructions. Leads to 80 fewer INV instructions and 80 more RSQ. Occasionally the sqrt's result is no longer used, leading to: instructions in affected programs: 5005 -> 4949 (-1.12%) Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965/vec4: Call opt_algebraic after opt_cse.Matt Turner2014-09-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The next patch adds an algebraic optimization for the pattern sqrt a, b rcp c, a and turns it into sqrt a, b rsq c, b but many vertex shaders do a = sqrt(b); var1 /= a; var2 /= a; which generates sqrt a, b rcp c, a rcp d, a If we apply the algebraic optimization before CSE, we'll end up with sqrt a, b rsq c, b rcp d, a Applying CSE combines the RCP instructions, preventing this from happening. No shader-db changes. Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965/fs: Extend predicated break pass to predicate WHILE.Matt Turner2014-09-301-0/+36
| | | | | | | | Helps a handful of programs in Serious Sam 3 that use do-while loops. instructions in affected programs: 16114 -> 16075 (-0.24%) Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965/fs: Don't make a name for a vector splitting temporaryIan Romanick2014-09-301-3/+8
| | | | | | | | | | If the name is just going to get dropped, don't bother making it. If the name is made, release it sooner (rather than later). No change Valgrind massif results for a trimmed apitrace of dota2. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* glsl: Add context-level controls for whether temporaries have real namesIan Romanick2014-09-303-0/+22
| | | | | | | | | No change Valgrind massif results for a trimmed apitrace of dota2. v2: Minor rebase on _mesa_init_constants changes. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* glsl: Make ir_variable::num_state_slots and ir_variable::state_slots privateIan Romanick2014-09-305-23/+23
| | | | | | | | | | | | Also move num_state_slots inside ir_variable_data for better packing. The payoff for this will come in a few more patches. No change Valgrind massif results for a trimmed apitrace of dota2. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
* i965/brw_reg: Make the accumulator register take an explicit width.Jason Ekstrand2014-09-303-10/+15
| | | | | | | The big pile of patches I just pushed regresses about 25 piglit tests on SNB. This fixes the regressions. Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
* st/mesa: remove unneded PIPE_TEXTURE_CUBE check in st_texture_create()Brian Paul2014-09-301-1/+1
| | | | | | | Earlier in the function we assert layers==6 for PIPE_TEXTURE_CUBE so there's no reason to special-case the pt.array_size = layers assignment. Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* mesa: Drop the always-software-primitive-restart paths.Eric Anholt2014-09-305-58/+8
| | | | | | | The core sw primitive restart code is still around, because i965 uses it in some cases, but there are no drivers that want it on all the time. Reviewed-by: Rob Clark <robdclark@gmail.com>
* gallium: Drop software-only primitive restart support.Eric Anholt2014-09-301-3/+2
| | | | | | | | | | | | | The drivers not flagging primitive restart support are r300 swtcl, svga, nv30, and vc4. The point of primitive restart is to slightly reduce draw call overhead for apps by batching multiple draws. If we do an extra pass to read the index buffer and split back into multiple draws, we've entirely missed the point. This is particularly bad for drivers that otherwise have hardware IB reads, where the readback is probably uncached. Reviewed-by: Rob Clark <robdclark@gmail.com>
* i965/fs: Properly calculate the number of instructions in ↵Jason Ekstrand2014-09-301-1/+3
| | | | | | | calculate_register_pressure Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Use the GRF for FB writes on gen >= 7Jason Ekstrand2014-09-306-71/+142
| | | | | | | | | | | | | | | On gen 7, the MRF was removed and we gained the ability to do send instructions directly from the GRF. This commit enables that functinoality for FB writes. v2: Make handling of components more sane. i965/fs: Force a high register for the final FB write v2: Renamed the array for the range mappings and added a comment Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Handle COMPR4 in LOAD_PAYLOADJason Ekstrand2014-09-302-1/+36
| | | | | Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Constant propagate into LOAD_PAYLOADJason Ekstrand2014-09-301-0/+1
| | | | | Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Add split_virtual_grfs and compute_to_mrf after lower_load_payloadJason Ekstrand2014-09-301-0/+2
| | | | | | | | If we are going to use LOAD_PAYLOAD operations to fill MRF registers, then we will need this. Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Add a an optional source to the FS_OPCODE_FB_WRITE instructionJason Ekstrand2014-09-304-29/+28
| | | | | | | | | Previously, we were use the base_mrf parameter of fs_inst to store the MRF location. In preparation for doing FB writes from the GRF, we now also allow you to set inst->base_mrf to -1 and provide a source register. Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Use the GRF for UNTYPED_SURFACE_READ instructionsJason Ekstrand2014-09-304-16/+24
| | | | | Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Use the GRF for UNTYPED_ATOMIC instructionsJason Ekstrand2014-09-306-25/+36
| | | | | Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Add a function for getting a component of a 8 or 16-wide registerJason Ekstrand2014-09-301-0/+10
| | | | | Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Use the instruction execution size directly for texture generationJason Ekstrand2014-09-301-3/+10
| | | | | Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Use exec_size instead of force_uncompressed in dump_instructionJason Ekstrand2014-09-301-6/+7
| | | | | Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Use instruction execution sizes instead of heuristicsJason Ekstrand2014-09-303-23/+10
| | | | | Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Use instruction execution sizes to set compression stateJason Ekstrand2014-09-301-6/+19
| | | | | Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Remove unneeded uses of force_uncompressedJason Ekstrand2014-09-303-25/+9
| | | | | Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Derive force_uncompressed from instruction exec_sizeJason Ekstrand2014-09-301-0/+3
| | | | | Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Make fs_reg::effective_width take fs_inst* instead of fs_visitor*Jason Ekstrand2014-09-303-37/+43
| | | | | | | | | | | | | Now that we have execution sizes, we can use that instead of the dispatch width. This way it also works for 8-wide instructions in SIMD16. i965/fs: Make effective_width a variable instead of a function i965/fs: Preserve effective width in constant propagation Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Better guess the width of LOAD_PAYLOADJason Ekstrand2014-09-301-2/+9
| | | | | Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Add an exec_size field to fs_instJason Ekstrand2014-09-305-32/+126
| | | | | | | | | | | | | | | This will, eventually, allow us to manage execution sizes of instructions in a much more natural way from the fs_visitor level. i965/fs: Explicitly set instruction execute size a couple of places i965/blorp: Explicitly set instruction execute sizes Since blorp is all 16-wide and nothing isn't, in general, very careful about register width, we'll just set it all explicitly. Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Determine partial writes based on the destination widthJason Ekstrand2014-09-302-5/+3
| | | | | | | | | Now that we track both halves of a 16-wide vgrf, we no longer need to worry about force_sechalf or force_uncompressed. The only real issue is if the destination is too small. Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Fix a bug in register coalesceJason Ekstrand2014-09-301-0/+17
| | | | | | | | | | This commit fixes a bug in register coalesce that happens when one register is moved to another the proper number of times but the channels are re-arranged. When this happens, the previous code would happily coalesce the registers regardless of the fact that the channel mappins were wrong. Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Rework GEN5 texturing code to use fs_reg and offset()Jason Ekstrand2014-09-301-39/+38
| | | | | | | | | | | | Now that offset() can properly handle MRF registers, we can use an MRF fs_reg and let offset() handle incrementing it correctly for different dispatch widths. While this doesn't have any noticeable effect currently, it does ensure that the destination register is 16-wide which will be necessary later when we start detecting execution sizes based on source and destination registers. Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs_reg: Allocate double the number of vgrfs in SIMD16 modeJason Ekstrand2014-09-309-157/+371
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is actually the squash of a bunch of different changes. Individual commit titles follow: i965/fs: Always 2-align registers SIMD16 for gen <= 5 i965/fs: Use the register width when applying offsets This reworks both byte_offset() and offset() to be more intelligent. The byte_offset() function now supports offsets bigger than 32. The offset() function uses the byte_offset() function together with the register width and the type size to offset the register by the correct amount. i965/fs: Change regs_read to be in hardware registers i965/fs: Change regs_written to be actual hardware registers i965/fs: Properly handle register widths in LOAD_PAYLOAD The LOAD_PAYLOAD instruction is a bit special because it collects a bunch of registers (with possibly different widths) into a single payload block. Once the payload is constructed, it's treated as a single block of data and most of the information such as register widths doesn't matter anymore. In particular, the offset of any particular source register is the accumulation of the sizes of the previous source registers. i965/fs: Properly set writemasks in LOAD_PAYLOAD i965/fs: Handle register widths in demote_pull_constants i965/fs: Get rid of implicit register doubling in the allocator i965/fs: Reserve enough registers for PLN instructions i965/fs: Make sources and destinations interfere in 16-wide i965/fs: Properly handle register widths in CSE i965/fs: Properly handle register widths in register_coalesce i965/fs: Properly handle widths in copy propagation i965/fs: Properly handle register widths in VARYING_PULL_CONSTANT_LOAD i965/fs: Properly handle register widths and odd register sizes in spilling i965/fs: Don't waste a register on texture lookups for gen >= 7 Previously, we were waisting a register in SIMD16 mode because we could only allocate registers in pairs. Now that we can allocate and address odd-sized registers, let's get rid of this special-case. Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Handle printing of registers better.Jason Ekstrand2014-09-301-2/+6
| | | | | Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Explicitly set widths on gen5 math instruction destinations.Jason Ekstrand2014-09-301-1/+1
| | | | | Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>