aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_clip.c
Commit message (Collapse)AuthorAgeFilesLines
* i965: Move VUE map computation to once at VS compile time.Eric Anholt2012-02-211-1/+1
| | | | | | | | | | With this and the previous patch, 640x480 nexuiz is running 0.169118% +/- 0.0863696% faster (n=121). On a VS state change microbenchmark, performance is increased 8.28645% +/- 0.460478% (n=52). v2: Fix CACHE_NEW_VS comment. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Make the userclip flag for the VUE map come from VS prog data.Eric Anholt2012-02-211-3/+3
| | | | | | | | This reduces recomputation of state based on non-clipping-related transform changes, and is a step toward removing VUE map recomputation. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Move program compile to emit() time.Eric Anholt2011-10-291-2/+3
| | | | | | | Only 4 other prepare() functions are left, which don't rely on this. Reviewed-by: Kenneth Graunke <[email protected]> Acked-by: Paul Berry <[email protected]>
* i965: Make brw_compute_vue_map's userclip dependency a boolean.Paul Berry2011-10-061-1/+1
| | | | | | | | | | | | | Previously, brw_compute_vue_map required an argument indicating the number of clip planes in use, but all it did with it was check if it was nonzero. This patch changes brw_compute_vue_map to take a boolean instead. This allows us to avoid some unnecessary recompilation of the Gen4/5 GS and SF threads. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* mesa: Create _mesa_bitcount_64() to replace i965's brw_count_bits()Paul Berry2011-10-061-1/+1
| | | | | | | | | | | | | | | | The i965 driver already had a function to count bits in a 64-bit uint (brw_count_bits()), but it was buggy (it only counted the bottom 32 bits) and it was clumsy (it had a strange and broken fallback for non-GCC-like compilers, which fortunately was never used). Since Mesa already has a _mesa_bitcount() function, it seems better to just create a _mesa_bitcount_64() function rather than special-case this in the i965 driver. This patch creates the new _mesa_bitcount_64() function and rewrites all of the old brw_count_bits() calls to refer to it. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Remove two_side_color from brw_compute_vue_map().Paul Berry2011-09-061-3/+1
| | | | | | | | | | | Since we now lay out the VUE the same way regardless of whether two-sided color is enabled, brw_compute_vue_map() no longer needs to know whether two-sided color is enabled. This allows the two-sided color flag to be removed from the clip, GS, and VS keys, so that fewer GPU programs need to be recompiled when turning two-sided color on and off. Reviewed-by: Eric Anholt <[email protected]>
* i965: clip: Remove no-longer-needed variables.Paul Berry2011-09-061-23/+0
| | | | | | | | The variables offset[], idx_to_attr[], nr_bytes, nr_attrs, and header_regs were all serving purposes which are now served by the VUE map. Reviewed-by: Eric Anholt <[email protected]>
* i965: clip: Change computation of nr_regs to use VUE map.Paul Berry2011-09-061-5/+5
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965: clip: Move header_regs into brw_clip_compile.Paul Berry2011-09-061-5/+4
| | | | | | This makes header_regs available for computing VUE offsets within clip code. Reviewed-by: Eric Anholt <[email protected]>
* i965: clip: Move hpos_offest and ndc_offset into local functions.Paul Berry2011-09-061-2/+0
| | | | | | | | | The offsets within the VUE of HPOS and NDC are needed only in a few auxiliary clipping functions. This patch moves computation of those offsets into the functions that need them, and does the computation using the VUE map. Reviewed-by: Eric Anholt <[email protected]>
* i965: clip: rename header_position_offset to the more correct ndc_offset.Paul Berry2011-09-061-1/+1
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965: clip: Add VUE map computation to clip stage for Gen4-5.Paul Berry2011-09-061-0/+3
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965: Fix Android build by removing relative includesChad Versace2011-08-301-1/+1
| | | | | | | | | | Replace each occurence of #include "../glsl/*.h" with #include "glsl/*.h" Reviewed-by: Ian Romanick <[email protected]> Signed-off-by: Chad Versace <[email protected]>
* i965: Use state streaming on programs, and state base address on gen5+.Eric Anholt2011-06-181-14/+10
| | | | | | | | | | There will be a little bit of thrashing of the program cache BO as the cache warms up, but once the application is in steady state, this reduces relocations on gen5 and later. On my T420 laptop, cairogl firefox-talos-gfx performance improves 2.6% +/- 1.3% (n=6). No statistically significant performance difference on nexuiz (n=5).
* i965: Get a ralloc context into brw_compile.Kenneth Graunke2011-05-171-1/+7
| | | | | | | | | | | | This would be so much easier if we were using C++; we could simply use constructors and destructors. Instead, we have to update all the callers. While we're at it, ralloc various brw_wm_compile fields rather than explicitly calloc/free'ing them. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Remove dead entrypoints to state cache, rename the one that's left.Eric Anholt2011-04-291-9/+6
| | | | | | | | | | As we expanded the usage of the state cache, it grew extra functionality. However, with the recent state streaming rework, we're back to the state cache being used only for shader kernels, which is the piece of GPU state that's actually expensive to compute again from scratch, since it involves compiling. Reviewed-by: Kenneth Graunke <[email protected]>
* intel: Annotate debug printout checks with unlikely().Eric Anholt2010-11-031-2/+2
| | | | | | | This provides the optimizer with hints about code hotness, which we're quite certain about for debug printouts (or, rather, while we developers often hit the checks for debug printouts, we don't care about performance while doing so).
* Drop GLcontext typedef and use struct gl_context insteadKristian Høgsberg2010-10-131-1/+1
|
* i965: Reduce repeated calculation of the attribute-offset-in-VUE.Eric Anholt2010-07-191-9/+11
| | | | | | This cleans up some chipset dependency sprinkled around, and fixes a potential overflow of the attribute offset array for many vertex results.
* i965: Clarify the nr_regs calculation in brw_clip.cEric Anholt2010-07-191-3/+8
|
* intel: Change dri_bo_* to drm_intel_bo* to consistently use new API.Eric Anholt2010-06-081-2/+2
| | | | | The slightly less mechanical change of converting the emit_reloc calls will follow.
* i965: Dump out the correct shared function for SEND on Ironlake.Eric Anholt2010-05-141-1/+2
|
* i965: Support INTEL_DEBUG=clip to dump the clip program.Eric Anholt2010-05-141-1/+7
|
* intel: Clean up chipset name and gen num for IronlakeZhenyu Wang2010-04-211-3/+3
| | | | | | | | | Rename old IGDNG to Ironlake, and set 'gen' number for Ironlake as 5, so tracking the features with generation num instead of special is_ironlake flag. Reviewed-by: Eric Anholt <[email protected]> Signed-off-by: Zhenyu Wang <[email protected]>
* i965: Allow for variable-sized auxdata in the state cache.Eric Anholt2010-01-191-7/+8
| | | | | | Everything has been constant-sized until now, but constant buffer handling changes will make us want some additional variable sized array.
* intel: Replace IS_IGDNG checks with intel->is_ironlake or needs_ff_sync.Eric Anholt2009-12-221-5/+6
| | | | Saves ~480 bytes of code.
* Merge branch 'outputswritten64'Ian Romanick2009-11-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Add a GLbitfield64 type and several macros to operate on 64-bit fields. The OutputsWritten field of gl_program is changed to use that type. This results in a fair amount of fallout in drivers that use programs. No changes are strictly necessary at this point as all bits used are below the 32-bit boundary. Fairly soon several bits will be added for clip distances written by a vertex shader. This will cause several bits used for varyings to be pushed above the 32-bit boundary. This will affect any drivers that support GLSL. At this point, only the i965 driver has been modified to support this eventuality. I did this as a "squash" merge. There were several places through the outputswritten64 branch where things were broken. I foresee this causing difficulties later for bisecting. The history is still available in the branch. Conflicts: src/mesa/drivers/dri/i965/brw_wm.h
* i965: fix EXT_provoking_vertex supportRoland Scheidegger2009-11-111-0/+1
| | | | | | | | This didn't work for quad/quadstrips at all, and for all other primitive types it only worked when they were unclipped. Fix up the former in gs stage (could probably do without these changes and instead set QuadsFollowProvokingVertexConvention to false), and the rest in clip stage.
* i965: Don't clip everything if FRONT_AND_BACK culling while culling disabled.Eric Anholt2009-07-201-1/+2
| | | | | | Fixes everything-black with meta_clear_tris on quake4-mpdemo and doom3-demo. Bug #18844, 22077.
* i965: add support for new chipsetsXiang, Haihao2009-07-131-4/+18
| | | | | | | | | | 1. new PCI ids 2. fix some 3D commands on new chipset 3. fix send instruction on new chipset 4. new VUE vertex header 5. ff_sync message (added by Zou Nan Hai <[email protected]>) 6. the offset in JMPI is in unit of 64bits on new chipset 7. new cube map layout
* i965: Remove brw->attribs now that we can just always look in the GLcontext.Eric Anholt2009-02-021-20/+20
|
* mesa: added "main/" prefix to includes, remove some -I paths from ↵Brian Paul2008-09-181-3/+3
| | | | Makefile.template
* Revert "Revert "Merge branch 'drm-gem'""Dave Airlie2008-08-241-3/+1
| | | | This reverts commit 7c81124d7c4a4d1da9f48cbf7e82ab1a3a970a7a.
* Revert "Merge branch 'drm-gem'"Dave Airlie2008-08-241-1/+3
| | | | | | | | This reverts commit 53675e5c05c0598b7ea206d5c27dbcae786a2c03. Conflicts: src/mesa/drivers/dri/i965/brw_wm_surface_state.c
* intel-gem: Update to new check_aperture API for classic mode.Eric Anholt2008-08-081-3/+1
| | | | | | To do this, I had to clean up some of 965 state upload stuff. We may end up over-emitting state in the aperture overflow case, but that should be rare, and I'd rather have the simplification of state management.
* i965: initial attempt at fixing the aperture overflowDave Airlie2008-04-181-2/+4
| | | | | | | | | Makes state emission into a 2 phase, prepare sets things up and accounts the size of all referenced buffer objects. The emit stage then actually does the batchbuffer touching for emitting the objects. There is an assert in dri_emit_reloc if a reloc occurs for a buffer that hasn't been accounted yet.
* [965] Clean up whitespace and dead code from do_unfilled change.Eric Anholt2008-03-261-11/+6
|
* i965: new integrated graphics chipset supportXiang, Haihao2008-01-291-1/+1
|
* [965] Replace the state cache suballocator with direct dri_bufmgr use.Eric Anholt2007-12-141-21/+14
| | | | | | | | | | | | | | | | | | | | | | | The user-space suballocator that was used avoided relocation computations by using the general and surface state base registers and allocating those types of buffers out of pools built on top of single buffer objects. It also avoided calls into the buffer manager for these small state allocations, since only one buffer object was being used. However, the buffer allocation cost appears to be low, and with relocation caching, computing relocations for buffers is essentially free. Additionally, implementing the suballocator required a don't-fence-subdata flag to disable waiting on buffer maps so that writing new data didn't block on rendering using old data, and careful handling when mapping to update old data (which we need to do for unavoidable relocations with FBOs). More importantly, when the suballocator filled, it had no replacement algorithm and just threw out all of the contents and forced them to be recomputed, which is a significant cost. This is the first step, which just changes the buffer type, but doesn't yet improve the hash table to not result in full recompute on overflow. Because the buffers are all allocated out of the general buffer allocator, we can no longer use the general/surface state bases to avoid relocations, and they are set to 0 instead.
* Revert "[965] Add missing flagging of new stage programs for updating stage ↵Eric Anholt2007-12-051-19/+15
| | | | | | | | | state." I had forgotten part of brw_state_cache.c that made this fix not relevant for master (last_addr comparison and flagging based on cache id). This reverts commit a4642f3d18bdaebaba31e5dee72fe5de9d890ffb.
* [965] Add missing flagging of new stage programs for updating stage state.Eric Anholt2007-12-051-15/+19
| | | | | | Otherwise, choosing a new program wouldn't necessarily update the state, and and an old program could be executed, leading to various sorts of pretty pictures or hangs.
* i965: handle all unfilled mode in clip stage. fix bug #12453Xiang, Haihao2007-09-271-0/+4
|
* i965: Avoid branch instructions while in single program flow mode.Eric Anholt2007-01-061-0/+2
| | | | | | | | | | | | There is an errata for Broadwater that threads don't have the instruction/loop mask stacks initialized on thread spawn. In single program flow mode, those stacks are not writable, so we can't initialize them. However, they do get read during ELSE and ENDIF instructions. So, instead, replace branch instructions in single program flow mode with predicated jumps (ADD to the ip register), avoiding use of the more complicated branch instructions that may fail. This is also a minor optimization as no ENDIF equivalent is necessary. Signed-off-by: Keith Packard <[email protected]>
* Add Intel i965G/Q DRI driver.Eric Anholt2006-08-091-0/+264
This driver comes from Tungsten Graphics, with a few further modifications by Intel.