aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_urb.c
Commit message (Collapse)AuthorAgeFilesLines
* i965: Make all atoms to track BRW_NEW_BLORP by defaultKenneth Graunke2016-04-231-1/+2
| | | | Reviewed-by: Topi Pohjolainen <[email protected]
* i965: remove trailing spaces in various filesIago Toral Quiroga2015-11-251-10/+10
| | | | Acked-by: Kenneth Graunke <[email protected]>
* i965: Optimize batchbuffer macros.Matt Turner2015-07-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Previously OUT_BATCH was just a macro around an inline function which does brw->batch.map[brw->batch.used++] = dword; When making consecutive calls to intel_batchbuffer_emit_dword() the compiler isn't able to recognize that we're writing consecutive memory locations or that it doesn't need to write batch.used back to memory each time. We can avoid both of these problems by making a local pointer to the next location in the batch in BEGIN_BATCH(). Cuts 18k from the .text size. text data bss dec hex filename 4946956 195152 26192 5168300 4edcac i965_dri.so before 4928956 195152 26192 5150300 4e965c i965_dri.so after This series (including commit c0433948) improves performance of Synmark OglBatch7 by 8.01389% +/- 0.63922% (n=83) on Ivybridge. Reviewed-by: Chris Wilson <[email protected]>
* i965: Add and use USED_BATCH macro.Matt Turner2015-07-151-2/+2
| | | | | | | The next patch will replace the .used field with an on-demand calculation of batchbuffer usage. Reviewed-by: Chris Wilson <[email protected]>
* i965/state: Don't use brw->state.dirty.brwJordan Justen2015-03-311-1/+1
| | | | | | | | | | | | | | | Now, we only use ctx->NewDriverState. I used this bash & sed command in the i965 directory: for file in *.[ch] *.[ch]pp; do sed -i -e 's/state\.dirty\.brw/ctx.NewDriverState/g' $file done Followed by manual changes to brw_state_upload.c. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Move BRW_NEW_*_PROG_DATA flags to .brw (not .cache).Kenneth Graunke2014-12-021-3/+3
| | | | | | | | | | | | | | | | | | | | | | I put the BRW_NEW_*_PROG_DATA flags at the beginning so that brw_state_cache.c can still continue using 1 << brw_cache_id. I also added a comment explaining the difference between BRW_NEW_*_PROG_DATA and BRW_NEW_*_PROGRAM, as it took me a long time to remember it. Non-mechanical changes: - brw_state_cache.c and brw_ff_gs.c now signal .brw, not .cache. - brw_state_upload.c - INTEL_DEBUG=state changes. - brw_context.h - bit definition merging. v2: Correct the explanation of BRW_NEW_*_PROG_DATA to mention state-based recompiles, and nix the "proper subset" claim, as it's false. (Caught by Kristian Høgsberg). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Rename CACHE_NEW_*_PROG to BRW_NEW_*_PROG_DATA.Kenneth Graunke2014-12-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we've moved a bunch of CACHE_NEW_* bits to BRW_NEW_*, the only ones that are left are legitimately related to the program cache. Yet, it seems a bit wasteful to have an entire bitfield for only 7 bits. State upload is one of the hottest paths in the driver. For each atom in the list, we call check_state() to see if it needs to be emitted. Currently, this involves comparing three separate bitfields (mesa, brw, and cache). Consolidating the brw and cache bitfields would save a small amount of CPU overhead per atom. Broadwell, for example, has 57 state atoms, so this small savings can add up. CACHE_NEW_*_PROG covers the brw_*_prog_data structures, as well as the offset into the program cache BO (prog_offset). Since most uses refer to brw_*_prog_data, I decided to use BRW_NEW_*_PROG_DATA as the name. Removing "cache" completely is a bit painful, so I decided to do it in several patches for easier review, and to separate mechanical changes from manual ones. This one simply renames things, and was made via: $ for file in *.[ch]; do sed -i -e 's/CACHE_NEW_\([A-Z_\*]*\)_PROG/BRW_NEW_\1_PROG_DATA/g' \ -e 's/BRW_NEW_WM_PROG_DATA/BRW_NEW_FS_PROG_DATA/g' $file done Note that BRW_NEW_*_PROG_DATA is still in .cache, not .brw! The next patch will remedy this flaw. It will also fix the alphabetization issues. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Acked-by: Matt Turner <[email protected]>
* i965: Alphabetize brw_tracked_state flags and use a consistent style.Kenneth Graunke2014-11-291-2/+2
| | | | | | | | | | | | | | | | Most of the dirty flags were listed in some arbitrary order. Some used bonus parenthesis. Some put multiple flags on one line, others put one per line. Some used tabs instead of spaces...but only on some lines. This patch settles on one flag per line, in alphabetical order, using spaces instead of tabs, and sheds the unnecessary parentheses. Sorting was mostly done with vim's visual block feature and !sort, although I alphabetized short lists by hand; it was pretty manual. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* Revert 5 i965 patches: 8e27a4d2, 373143ed, c5bdf9be, 6f56e142, 88e3d404Jordan Justen2014-09-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | Reverts * "i965: Modify state upload to allow 2 different sets of state atoms." 8e27a4d2b3e4e74e9a77446bce49607433d86be3 * "i965: Modify dirty bit handling to support 2 pipelines." 373143ed9187c4d4ce1e3c486b5dd0880d18ec8b * "i965: Create a macro for checking a dirty bit." c5bdf9be1eca190417998d548fd140c1eca37a54 Conflicts: src/mesa/drivers/dri/i965/brw_context.h * "i965: Create a macro for setting all dirty bits." 6f56e1424d923fd80c84090fbf4506c9eaaffea1 Conflicts: src/mesa/drivers/dri/i965/brw_blorp.cpp src/mesa/drivers/dri/i965/brw_state_cache.c src/mesa/drivers/dri/i965/brw_state_upload.c * "i965: Create a macro for setting a dirty bit." 88e3d404dad009d8cff5124cf8acee7daeaceb64 Signed-off-by: Jordan Justen <[email protected]>
* i965: Create a macro for setting a dirty bit.Paul Berry2014-09-011-1/+1
| | | | | | | This will make it easier to extend dirty bit handling to support compute shaders. Reviewed-by: Jordan Justen <[email protected]>
* i965: Move the remaining driver debug over to stderr.Eric Anholt2014-02-221-9/+10
| | | | | | Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* s/Tungsten Graphics/VMware/José Fonseca2014-01-171-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tungsten Graphics Inc. was acquired by VMware Inc. in 2008. Leaving the old copyright name is creating unnecessary confusion, hence this change. This was the sed script I used: $ cat tg2vmw.sed # Run as: # # git reset --hard HEAD && find include scons src -type f -not -name 'sed*' -print0 | xargs -0 sed -i -f tg2vmw.sed # # Rename copyrights s/Tungsten Gra\(ph\|hp\)ics,\? [iI]nc\.\?\(, Cedar Park\)\?\(, Austin\)\?\(, \(Texas\|TX\)\)\?\.\?/VMware, Inc./g /Copyright/s/Tungsten Graphics\(,\? [iI]nc\.\)\?\(, Cedar Park\)\?\(, Austin\)\?\(, \(Texas\|TX\)\)\?\.\?/VMware, Inc./ s/TUNGSTEN GRAPHICS/VMWARE/g # Rename emails s/[email protected]/[email protected]/ s/[email protected]/[email protected]/g s/jrfonseca-at-tungstengraphics-dot-com/jfonseca-at-vmware-dot-com/ s/jrfonseca\[email protected]/[email protected]/g s/keithw\[email protected]/[email protected]/g s/[email protected]/[email protected]/g s/thomas-at-tungstengraphics-dot-com/thellstom-at-vmware-dot-com/ s/[email protected]/[email protected]/ # Remove dead links s@Tungsten Graphics (http://www.tungstengraphics.com)@Tungsten Graphics@g # C string src/gallium/state_trackers/vega/api_misc.c s/"Tungsten Graphics, Inc"/"VMware, Inc"/ Reviewed-by: Brian Paul <[email protected]>
* i965: Drop trailing whitespace from the rest of the driver.Kenneth Graunke2013-12-051-13/+13
| | | | | | | Performed via: $ for file in *; do sed -i 's/ *//g'; done Signed-off-by: Kenneth Graunke <[email protected]>
* i965: Move intel_context::gen and gt fields to brw_context.Kenneth Graunke2013-07-091-2/+1
| | | | | | | | | | Most functions no longer use intel_context, so this patch additionally removes the local "intel" variables to avoid compiler warnings. Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chris Forbes <[email protected]> Acked-by: Paul Berry <[email protected]> Acked-by: Anuj Phogat <[email protected]>
* i965: Move intel_context::is_<platform> flags to brw_context.Kenneth Graunke2013-07-091-1/+1
| | | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chris Forbes <[email protected]> Acked-by: Paul Berry <[email protected]> Acked-by: Anuj Phogat <[email protected]>
* i965: Move intel_context::batch to brw_context.Kenneth Graunke2013-07-091-3/+3
| | | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chris Forbes <[email protected]> Acked-by: Paul Berry <[email protected]> Acked-by: Anuj Phogat <[email protected]>
* i965/vs: split brw_vs_prog_data into generic and VS-specific parts.Paul Berry2013-04-111-1/+1
| | | | | | | | | | | This will allow the generic parts to be re-used for geometry shaders. Reviewed-by: Jordan Justen <[email protected]> v2: Put urb_read_length and urb_entry_size in the generic struct. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel: Rename INTEL_DEBUG=fall to INTEL_DEBUG=perf.Eric Anholt2012-08-121-1/+1
| | | | | | | | | I want to introduce some more debug output for performance surprises that includes fallbacks, but aren't necessarily software rasterization. Leave INTEL_DEBUG=fall in place for those that have used that flag before. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/gen4: Move URB fence recalculate to emit() time.Eric Anholt2011-10-291-1/+1
| | | | | | | This is used by the unit state, which is at emit() time. Reviewed-by: Kenneth Graunke <[email protected]> Acked-by: Paul Berry <[email protected]>
* intel: Convert from GLboolean to 'bool' from stdbool.h.Kenneth Graunke2011-10-181-1/+1
| | | | | | | | | | | | | | | | | I initially produced the patch using this bash command: for file in {intel,i915,i965}/*.{c,cpp,h}; do [ ! -h $file ] && sed -i 's/GLboolean/bool/g' $file && sed -i 's/GL_TRUE/true/g' $file && sed -i 's/GL_FALSE/false/g' $file; done Then I manually added #include <stdbool.h> to fix compilation errors, and converted a few functions back to GLboolean that were used in core Mesa's function pointer table to avoid "incompatible pointer" warnings. Finally, I cleaned up some whitespace issues introduced by the change. Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chad Versace <[email protected]> Acked-by: Paul Berry <[email protected]>
* i965: Handle URB_FENCE erratum for BroadwaterChris Wilson2011-03-041-0/+8
| | | | | | | | | | There is a silicon bug which causes unpredictable behaviour if the URB_FENCE command should cross a cache-line boundary. Pad before the command to avoid such occurrences. As this command only applies to gen4/5, do the fixup unconditionally as the specs do not actually state for which chip it was fixed (and the cost is negligible)... Signed-off-by: Chris Wilson <[email protected]>
* intel: Annotate debug printout checks with unlikely().Eric Anholt2010-11-031-2/+2
| | | | | | | This provides the optimizer with hints about code hotness, which we're quite certain about for debug printouts (or, rather, while we developers often hit the checks for debug printouts, we don't care about performance while doing so).
* intel: Clean up chipset name and gen num for IronlakeZhenyu Wang2010-04-211-1/+1
| | | | | | | | | Rename old IGDNG to Ironlake, and set 'gen' number for Ironlake as 5, so tracking the features with generation num instead of special is_ironlake flag. Reviewed-by: Eric Anholt <[email protected]> Signed-off-by: Zhenyu Wang <[email protected]>
* Replace the _mesa_*printf() wrappers with the plain libc versionsKristian Høgsberg2010-02-191-3/+3
|
* intel: Replace IS_G4X() across the driver with context structure usage.Eric Anholt2009-12-221-4/+5
| | | | Saves ~2KB of code.
* intel: Replace IS_IGDNG checks with intel->is_ironlake or needs_ff_sync.Eric Anholt2009-12-221-1/+2
| | | | Saves ~480 bytes of code.
* i965: add support for new chipsetsXiang, Haihao2009-07-131-1/+11
| | | | | | | | | | 1. new PCI ids 2. fix some 3D commands on new chipset 3. fix send instruction on new chipset 4. new VUE vertex header 5. ff_sync message (added by Zou Nan Hai <[email protected]>) 6. the offset in JMPI is in unit of 64bits on new chipset 7. new cube map layout
* i965: Increase G4X default VS URB allocation to actually allow 32 threads.Eric Anholt2009-06-301-3/+14
| | | | | This improves the performance of my GLSL demo by 30%. It also fixes the VS deadlock that ut2004 had, for reasons I can't explain. Bug #21330.
* i965: Fix up clip min_nr_entries, preferred_nr_entries, and max_threads.Eric Anholt2008-11-121-1/+1
| | | | | | | | | The clip thread could potentially deadlock when processing tristrips since being moved back to dual-thread mode, as the two threads could each have 4 VUEs referenced and not be able to allocate another one since SF processing wasn't able to continue (needing 5 entries before it freed 2). In constrained URB mode, similar deadlock could even have occurred with polygons (so we cut back max_threads if we can't handle it any primitive type).
* i965: Add a big comment explaining my understanding of URB management.Eric Anholt2008-11-121-1/+38
| | | | | It shouldn't offer anything new over what's in the docs (except for G4X notes), but here it's all in one place.
* i965: Fix copy'n'paste issue that made brw->urb.constrained useless.Eric Anholt2008-11-021-3/+7
| | | | Also, add a comment explaining what brw->urb.constrained tries to do.
* Revert "Revert "Merge branch 'drm-gem'""Dave Airlie2008-08-241-14/+1
| | | | This reverts commit 7c81124d7c4a4d1da9f48cbf7e82ab1a3a970a7a.
* Revert "Merge branch 'drm-gem'"Dave Airlie2008-08-241-1/+14
| | | | | | | | This reverts commit 53675e5c05c0598b7ea206d5c27dbcae786a2c03. Conflicts: src/mesa/drivers/dri/i965/brw_wm_surface_state.c
* intel-gem: Update to new check_aperture API for classic mode.Eric Anholt2008-08-081-2/+1
| | | | | | To do this, I had to clean up some of 965 state upload stuff. We may end up over-emitting state in the aperture overflow case, but that should be rare, and I'd rather have the simplification of state management.
* 965: cleanups to state emission from aperture checking and state ordering.Eric Anholt2008-08-081-12/+0
|
* [i965] This is to fix random crash in some maps of Ut2004 demo.Zou Nan hai2008-04-221-1/+1
| | | | | | | e.g. bridge of fate. If vs output is big, driver may fall back to use 8 urb entries for vs, unfortunally, for some unknown reason, if vs is working at 4x2 mode, 8 entries is not enough, may lead to gpu hang.
* i965: initial attempt at fixing the aperture overflowDave Airlie2008-04-181-2/+3
| | | | | | | | | Makes state emission into a 2 phase, prepare sets things up and accounts the size of all referenced buffer objects. The emit stage then actually does the batchbuffer touching for emitting the objects. There is an assert in dri_emit_reloc if a reloc occurs for a buffer that hasn't been accounted yet.
* i965: remove unused hal hooksDave Airlie2008-02-141-15/+0
| | | | These don't appear to have ever been used.
* i965: new integrated graphics chipset supportXiang, Haihao2008-01-291-4/+4
|
* i965: The given urb layout(maximal size of urb entries and theXiang, Haihao2007-03-251-1/+1
| | | | values for nr of entries) should meet the requirement.
* VS nr of urb entries is constrained to be one of a fixed set of values,Keith Whitwell2006-09-141-2/+2
| | | | specifically {8,16,32}.
* Potential fix for doom3 lockups. Seems that there is a conflictKeith Whitwell2006-09-121-1/+1
| | | | | | | | | | between the vertex cache, the vertex shader and the clipping stages, all of which are competitors for URB entries assigned to the VS unit. This change reduces the maximum number of clip and VS threads by enough to ensure that they cannot consume all the available URB entries, and then reduces the number somewhat more up to an arbitary amount I discovered by trial and error. Unfortunately trial and error solutions don't inspire total confidence...
* Add Intel i965G/Q DRI driver.Eric Anholt2006-08-091-0/+215
This driver comes from Tungsten Graphics, with a few further modifications by Intel.