summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: emit DRAW_PREAMBLE only if it changesMarek Olšák2014-12-103-8/+17
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: remove setting of VGT_DISPATCH_DRAW_INDEXMarek Olšák2014-12-101-3/+0
| | | | | | | It's used only if VGT_SHADER_STAGES_EN.DISPATCH_DRAW_EN is 1, which we don't set. Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: emit GS_OUT_PRIM_TYPE only if it changesMarek Olšák2014-12-103-1/+6
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: emit primitive restart only if it changesMarek Olšák2014-12-103-5/+22
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: emit base vertex and start instance only if they changeMarek Olšák2014-12-103-3/+38
| | | | | | v2: added a helper function for invalidation of the sh constants Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: emit clip registers only if VS, GS, or rasterizer is changedMarek Olšák2014-12-105-32/+39
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: get info about VS outputs from tgsi_shader_infoMarek Olšák2014-12-103-35/+34
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: move all shader-related functions to a new file si_state_shaders.cMarek Olšák2014-12-106-785/+810
| | | | | | This huge amount of code deserves its own file. Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: generate derived and draw-related registers directly in the CSMarek Olšák2014-12-103-75/+76
| | | | | | The big function is split into 3 smaller functions. Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: si_conv_pipe_prim shouldn't failMarek Olšák2014-12-101-11/+3
| | | | | | An assertion should suffice. Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: remove useless variable si_context::pm4_dirty_cdwordsMarek Olšák2014-12-103-11/+1
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: remove unused draw packet functionsMarek Olšák2014-12-102-87/+0
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: emit draw packets directly into the CSMarek Olšák2014-12-103-68/+95
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: add emit util functions for SH registersMarek Olšák2014-12-102-1/+18
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* tgsi: add tgsi_shader_info::writes_clipvertexMarek Olšák2014-12-102-0/+4
| | | | Reviewed-by: Brian Paul <[email protected]>
* tgsi: add clip and cull distance writemasks into tgsi_shader_infoMarek Olšák2014-12-102-0/+6
| | | | Reviewed-by: Brian Paul <[email protected]>
* tgsi: add tgsi_shader_info::writes_psizeMarek Olšák2014-12-102-0/+4
| | | | Reviewed-by: Brian Paul <[email protected]>
* cso: put cso_release_all into cso_destroy_contextMarek Olšák2014-12-109-38/+9
| | | | Reviewed-by: Brian Paul <[email protected]>
* i965: Generate vs code using scalar backend for BDW+Kristian Høgsberg2014-12-106-15/+77
| | | | | | | | | | | | | | | | With everything in place, we can now use the scalar backend compiler for vertex shaders on BDW+. We make scalar vertex shaders the default on BDW+ but add a new vec4vs debug option to force the vec4 backend. No piglit regressions. Performance impact is minimal, I see a ~1.5 improvement on the T-Rex GLBenchmark case, but in general it's in the noise. Some of our internal synthetic, vs bounded benchmarks show great improvement, 20%-40% in some cases, but real-world cases are mostly unaffected. Signed-off-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Clean up fs_visitor::run and rename to run_fsKristian Høgsberg2014-12-102-19/+15
| | | | | | | | | Now that fs_visitor::run is back to being only fragment shader compilation, we can clean up a few stage == MESA_SHADER_FRAGMENT conditions and rename it to run_fs. Signed-off-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add fs_visitor::run_vs() to generate scalar vertex shader codeKristian Høgsberg2014-12-103-13/+436
| | | | | | | | | This patch uses the previous refactoring to add a new run_vs() method that generates vertex shader code using the scalar visitor and optimizer. Signed-off-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Rename brw_vec4_prog_data/key to brw_bue_prog_data/keyKristian Høgsberg2014-12-1013-42/+42
| | | | | | | | | | | These structs aren't vec4 specific, they are shared by shader stages operating on Vertex URB Entries (VUEs). VUEs are the data structures in the URB that hold vertex data between the pipeline geometry stages. Using vue in the name instead of vec4 makes a lot more sense, especially when we add scalar vertex shader support. Signed-off-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Prepare for using the ATTR register file in the fs backendKristian Høgsberg2014-12-104-6/+23
| | | | | | | | The scalar vertex shader will use the ATTR register file for vertex attributes. This patch adds support for the ATTR file to fs_visitor. Signed-off-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Consolidate code to get struct brw_sampler_prog_key_dataKristian Høgsberg2014-12-101-21/+16
| | | | | | | | This chunk of code is repeated in a few places, and we're going to add a MESA_SHADER_VERTEX case to it soon. Signed-off-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add new SIMD8 VS prog data flagKristian Høgsberg2014-12-106-7/+21
| | | | | | | | | | This flag signals that we have a SIMD8 VS shader so we can set up the corresponding state accordingly. This boils down to setting the BDW+ SIMD8 enable bit in 3DSTATE_VS and making UBO and pull constant buffers use dword pitch. Signed-off-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add SIMD8 URB write low-level IR instructionKristian Høgsberg2014-12-106-1/+51
| | | | | | | | | This is all we need from the generator for SIMD8 vertex shaders. This opcode is just the send instruction, all the hard work will happen in the visitor using LOAD_PAYLOAD. Signed-off-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Remove shader program argument and member from fs_generatorKristian Høgsberg2014-12-104-6/+3
| | | | | | | | Now that the caller passes in the shader debug name, we don't need this anymore. Signed-off-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Set shader name for generator from call siteKristian Høgsberg2014-12-104-24/+35
| | | | | | | | fs_generator no longer knows what stage it's generating code for, so we have to set the debug name of the shader from the call site. Signed-off-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Generalize fs_generator furtherKristian Høgsberg2014-12-104-16/+12
| | | | | | | | This removes all stage specific data from the generator, and lets us create a generator for any stage. Signed-off-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Don't copy propagate constants from sources with saturateKristian Høgsberg2014-12-101-0/+2
| | | | | | | | We don't propagate the saturate bit and some instructions can't saturate at all. If the source has saturate set, just skip propagation. Signed-off-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Replace 'noann' debug flag with 'ann'.Matt Turner2014-12-103-3/+3
| | | | | | Reviewed-by: Ben Widawsky <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Disable unlit-centroid workaround on Gen < 6.Matt Turner2014-12-101-3/+0
| | | | | | | | | | | | | | | | | | | | | Back to the original commit (8313f444) adding the workaround, we were enabling it on gens <= 7, even though gens <= 5 can't do multisampling. I cannot find documentation that says that Sandybridge needs this workaround but in practice disabling it causes these piglit tests to fail: EXT_framebuffer_multisample/interpolation {2,4} centroid-deriv{,-disabled} On Ironlake: total instructions in shared programs: 4358478 -> 4349671 (-0.20%) instructions in affected programs: 117680 -> 108873 (-7.48%) A bunch of shaders in TF2, Portal 2, and L4D2 are cut by 25~30%. Cc: "10.4" <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* hgl: traverse add-on entriesAdrien Destugues2014-12-101-2/+2
| | | | * Allow using symlinks to add-ons when developing.
* gallium/target: Haiku softpipeAlexander von Gluck IV2014-12-101-1/+1
| | | | * Use print macro to fix warning on 64-bit systems
* gallium/aux: Avoid redefining MAXAlexander von Gluck IV2014-12-101-0/+2
| | | | * Can be redefined on some platforms through u_debug.h
* clover: Use switch when creating kernel arguments.Jan Vesely2014-12-101-25/+19
| | | | | | | | | This way we get a warning if an enum value is not handled. v2: codestyle Signed-off-by: Jan Vesely <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* r600g: only init GS_VERT_ITEMSIZE on r600Dave Airlie2014-12-101-5/+2
| | | | | | | | | | | | On evergreen there are 4 regs, on r600/700 there is only one. Don't initialise regs and trash someone elses state. Not sure this fixes anything, but hey one less stupid. Reviewed-By: Glenn Kennard <[email protected]> Cc: "10.3 10.4" [email protected] Signed-off-by: Dave Airlie <[email protected]>
* vc4: Do QPU scheduling across uniform loads.Eric Anholt2014-12-091-28/+60
| | | | | | | | This means another pass of reordering the uniform data store, but it lets us pair up a lot more instructions. total instructions in shared programs: 44639 -> 43176 (-3.28%) instructions in affected programs: 36938 -> 35475 (-3.96%)
* vc4: Populate the delay field better, and schedule high delay first.Eric Anholt2014-12-091-1/+49
| | | | | | | This is a standard scheduling heuristic, and clearly helps. total instructions in shared programs: 46418 -> 44467 (-4.20%) instructions in affected programs: 42531 -> 40580 (-4.59%)
* vc4: Skip raddr dependencies for 32-bit immediate loads.Eric Anholt2014-12-091-2/+5
| | | | These don't have raddr fields.
* vc4: Mark VPM read setup as impacting VPM reads, not writes.Eric Anholt2014-12-091-1/+7
| | | | | Fixes assertion failures if we adjust scheduling priorities to emphasize VPM reads more.
* vc4: Refuse to merge instructions involving 32-bit immediate loads.Eric Anholt2014-12-091-0/+5
| | | | | An immediate load overwrites the mul and add operations, so you can't merge with them.
* clover: Fix build after llvm r223802Aaron Watry2014-12-091-0/+4
| | | | | Signed-off-by: Aaron Watry <awatry at gmail.com> Reviewed-by: Tom Stellard <[email protected]>
* freedreno/a4xx: frag-coord / face fixesRob Clark2014-12-091-6/+19
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a4xx: fix rendering to layer != 0Rob Clark2014-12-091-1/+4
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a4xx: temp hack for FLAT varyingsRob Clark2014-12-091-0/+19
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: lower TXP as neededRob Clark2014-12-093-3/+19
| | | | | | On a3xx, lower TXP for 3D textures, on a4xx lower all TXP. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a4xx: XA gpu hang at startupRob Clark2014-12-092-1/+9
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a4xx: texture fixesRob Clark2014-12-096-7/+54
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: cleanup slice alignment/setupRob Clark2014-12-091-36/+14
| | | | | | | | Collapse things back into a setup_slices() which takes the desired alignment as a param. This gets things ready for a4xx which has some slightly different requirements. Signed-off-by: Rob Clark <[email protected]>