aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/intel_batchbuffer.c
Commit message (Collapse)AuthorAgeFilesLines
* i965: Implement a CS stall workaround on Broadwell.Kenneth Graunke2014-02-201-0/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to the latest documentation, any PIPE_CONTROL with the "Command Streamer Stall" bit set must also have another bit set, with five different options: - Render Target Cache Flush - Depth Cache Flush - Stall at Pixel Scoreboard - Post-Sync Operation - Depth Stall I chose "Stall at Pixel Scoreboard" since we've used it effectively in the past, but the choice is fairly arbitrary. Implementing this in the PIPE_CONTROL emit helpers ensures that the workaround will always take effect when it ought to. Apparently, this workaround may be necessary on older hardware as well; for now I've only added it to Broadwell as it's absolutely necessary there. Subsequent patches could add it to older platforms, provided someone tests it there. v2: Only flag "Stall at Pixel Scoreboard" when none of the other bits are set (suggested by Ian Romanick). v3: Prefix the function with "gen8" (requested by Eric). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> (v2) Reviewed-by: Eric Anholt <[email protected]>
* i965: Implement a brw_load_register_mem helper function.Kenneth Graunke2014-02-071-0/+25
| | | | | | | | | | | | This saves some boilerplate and hides the OUT_RELOC/OUT_RELOC64 distinction. Placing the function in intel_batchbuffer.c is rather arbitrary; there wasn't really an obvious place for it. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Bump generation assertions on workaround flushes.Kenneth Graunke2014-01-311-2/+2
| | | | | | | | I haven't investigated whether these are necessary on Broadwell or not, but for paranoia's sake, we may as well continue doing them for now. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965: Update PIPE_CONTROL packet lengths for Broadwell.Kenneth Graunke2014-01-201-2/+20
| | | | | | | | | On Broadwell, PIPE_CONTROL needs an extra DWord to accomodate the 48-bit addressing. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Create a helper function for emitting PIPE_CONTROL writes.Kenneth Graunke2014-01-201-38/+50
| | | | | | | | | | | | | | | | | | | There are a lot of places that use PIPE_CONTROL to write a value to a buffer (either an immediate write, TIMESTAMP, or PS_DEPTH_COUNT). Creating a single function to do this seems convenient. As part of this refactor, we now set the PPGTT/GTT selection bit correctly on Gen7+. Previously, we set bit 2 of DW2 on all platforms. This is correct for Sandybridge, but actually part of the address on Ivybridge and later! Broadwell will also increase the length of these packets by 1; with the refactoring, we should have to adjust that in substantially fewer places, giving us confidence that we've hit them all. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Use full-length PIPE_CONTROL packets for workaround writes.Kenneth Graunke2014-01-201-6/+9
| | | | | | | | | | | | | | | I believe that PIPE_CONTROL uses the length field to decide whether to do 32-bit or 64-bit writes. A length of 4 would do a 32-bit write, while a length of 5 would do a 64-bit write. (I haven't verified this, though.) For workaround writes, we don't care what value gets written, or how much data. We're only writing something because hardware bugs mandate that do so. So using a 64-bit write should be fine. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Emit full-length PIPE_CONTROLs for (non-write) flushes.Kenneth Graunke2014-01-201-2/+3
| | | | | | | | | | | | | | | | The PIPE_CONTROL packet actually has 5 DWords on Gen6+: 1. Header 2. Flags 3. Address 4. Immediate Data: Lower DWord 5. Immediate Data: Upper DWord We just never emitted the last one. While it appears to work, it's probably safer to emit the entire thing. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Create a helper function for emitting PIPE_CONTROL flushes.Kenneth Graunke2014-01-201-67/+58
| | | | | | | | | | | | | | | These days, we need to emit PIPE_CONTROL flushes all over the place. Being able to do that via a single function call seems convenient. Broadwell will also increase the length of these packets by 1; with the refactoring, we should have to do this in substantially fewer places. v2: Add back forgotten intel_emit_post_sync_nonzero_flush (caught by Eric Anholt). Drop unlikely() from BLT_RING check. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Introduce an OUT_RELOC64 macro.Kenneth Graunke2014-01-201-0/+24
| | | | | | | | | Broadwell uses 48-bit addresses. The first DWord is the low 32 bits, and the second DWord is the high 16 bits. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Use the new drm_intel_bo offset64 field.Kenneth Graunke2014-01-201-3/+3
| | | | | | | | | | | | | | | | | | | libdrm 2.4.52 introduces a new 'uint64_t offset64' field, intended to replace the old 'unsigned long offset' field. To preserve ABI, libdrm continues to store the presumed offset in both locations. On Broadwell, a 64-bit kernel may place BOs at "high" (> 4G) addresses. However, with a 32-bit userspace, the 'unsigned long offset' field will only be 32-bit, which is not large enough to hold this value. We need to use a proper uint64_t (like the kernel does). Technically, a lot of this code doesn't affect Broadwell, so we could leave it using the old field. But it makes sense to just switch to the new, properly typed field. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Delete intel_batchbuffer_emit_reloc_fenced.Kenneth Graunke2014-01-201-25/+0
| | | | | | | | Nothing in i965 uses it. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Remove CACHED_BATCH support altogether.Kenneth Graunke2014-01-171-42/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | Using an unoptimized variant of glamor spending 50% of its CPU time in brw_draw_prims() (and hitting the cache *very* frequently): N Min Max Median Avg Stddev x 200 29200 40500 34900 34750 958.43256 + 200 31000 40300 34700 34622 916.35941 No difference proven at 95.0% confidence Similarly, no difference on GLB2.7: N Min Max Median Avg Stddev x 63 64.1 71.36 70.69 70.113175 1.6782026 + 63 63.6 71.18 70.75 70.223651 1.6044186 No difference proven at 95.0% confidence v2: Rebase on master (by anholt) v3: Add a missing BEGIN_BATCH(3) to aa_line_parameters -- CACHED_BATCH didn't have the asserts about batchbuffer usage that ADVANCE_BATCH does, so we started assertion failing. Signed-off-by: Kenneth Graunke <[email protected]> Signed-off-by: Eric Anholt <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* s/Tungsten Graphics/VMware/José Fonseca2014-01-171-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tungsten Graphics Inc. was acquired by VMware Inc. in 2008. Leaving the old copyright name is creating unnecessary confusion, hence this change. This was the sed script I used: $ cat tg2vmw.sed # Run as: # # git reset --hard HEAD && find include scons src -type f -not -name 'sed*' -print0 | xargs -0 sed -i -f tg2vmw.sed # # Rename copyrights s/Tungsten Gra\(ph\|hp\)ics,\? [iI]nc\.\?\(, Cedar Park\)\?\(, Austin\)\?\(, \(Texas\|TX\)\)\?\.\?/VMware, Inc./g /Copyright/s/Tungsten Graphics\(,\? [iI]nc\.\)\?\(, Cedar Park\)\?\(, Austin\)\?\(, \(Texas\|TX\)\)\?\.\?/VMware, Inc./ s/TUNGSTEN GRAPHICS/VMWARE/g # Rename emails s/[email protected]/[email protected]/ s/[email protected]/[email protected]/g s/jrfonseca-at-tungstengraphics-dot-com/jfonseca-at-vmware-dot-com/ s/jrfonseca\[email protected]/[email protected]/g s/keithw\[email protected]/[email protected]/g s/[email protected]/[email protected]/g s/thomas-at-tungstengraphics-dot-com/thellstom-at-vmware-dot-com/ s/[email protected]/[email protected]/ # Remove dead links s@Tungsten Graphics (http://www.tungstengraphics.com)@Tungsten Graphics@g # C string src/gallium/state_trackers/vega/api_misc.c s/"Tungsten Graphics, Inc"/"VMware, Inc"/ Reviewed-by: Brian Paul <[email protected]>
* i965: Drop trailing whitespace from the rest of the driver.Kenneth Graunke2013-12-051-5/+5
| | | | | | | Performed via: $ for file in *; do sed -i 's/ *//g'; done Signed-off-by: Kenneth Graunke <[email protected]>
* Revert "i965: Move brw_emit_query_begin() to the render ring prelude."Kenneth Graunke2013-12-031-7/+1
| | | | | | | | | | | | This reverts commit a4bf7f6b6e612626c4e4fc21507ac213a7ba4b00. It breaks occlusion queries on Gen4-5. Doing this right will likely require larger changes, which should be done at a future date. Some Piglit tests still passed due to other bugs; fixing those revealed this problem. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Take "bookend" OA snapshots at the start/end of each batch.Kenneth Graunke2013-11-211-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unfortunately, our hardware only has one set of aggregating performance counters shared between all 3D programs, and their values are not saved or restored by hardware contexts. Also, at least on Sandybridge and Ivybridge, the counters lose their values if the GPU goes to sleep. To work around both of these problems, we have to snapshot the performance counters at the beginning and end of each batch, similar to how we handle query objects on platforms that don't support hardware contexts. I call these "bookend" snapshots. Since there can be multiple performance monitors active at a time, we store the bookend snapshots in a global BO, shared by all monitors. For monitors that span multiple batches, acquiring results involves adding up three segments: BeginPerfMonitor --> End of Batch 1 ("head") Start of Batch 2 --> End of Batch 2 ... ("middle") Start of Batch N-1 --> End of Batch N-1 Start of Batch N --> EndPerfMonitor ("tail") Monitors that refer to bookend BO snapshots are considered "unresolved". We delay resolving them (and adding up deltas to obtain the results) as long as possible to avoid blocking on mapping monitor->oa_bo. We can also run out of space in the bookend BO, at which point we have to resolve all unresolved monitors. Then we can throw away the snapshots and begin writing at the beginning of the buffer. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Start and stop OA counters as necessary.Kenneth Graunke2013-11-211-0/+7
| | | | | | | | | | | | | We need to start OA at the beginning of each batch where monitors are active. OACONTROL isn't part of the hardware context, so to avoid leaving counters enabled for other applications, we turn them off at the end of the batch too. We also need to start them at BeginPerfMonitor time (unless they've already been started). We stop them when the monitor last ends as well. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Periodically dump the list of monitors if INTEL_DEBUG=perfmon.Kenneth Graunke2013-11-211-0/+3
| | | | | | | | It's useful to see the state of all outstanding monitors; the start of a new batch seems like a reasonable time to print them out. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Move brw_emit_query_begin() to the render ring prelude.Kenneth Graunke2013-11-211-0/+8
| | | | | | | | | | | | | | | | | | | Without hardware contexts, the pipeline statistics registers are free-running and include data from every 3D application running. In order to find out the contributions of one particular context, we need to take a snapshot at the start and end of each batch. Previously, we emitted the PIPE_CONTROL necessary to capture PS_DEPTH_COUNT when drawing primitives. Special tracking ensured it happened only on the first draw of the batch, rather than on every draw. Moving this to brw_new_batch increases symmetry, since the final snapshot has always been in brw_finish_batch, which is just a few lines below. It should be basically equivalent. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Introduce a "render ring prelude" hook.Kenneth Graunke2013-11-211-0/+5
| | | | | | | | | | The new intel_batchbuffer_emit_render_ring_prelude() hook will be called when switching from BLT or UNKNOWN_RING to RENDER_RING. This provides a place to emit state that should go at the start of each render ring batch, with minimal overhead. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Introduce an UNKNOWN_RING state.Kenneth Graunke2013-11-211-0/+8
| | | | | | | | | | | | | | | | | | | When we first create a batch buffer, it's empty. We don't actually know what ring it will be targeted at until the first BEGIN_BATCH or BEGIN_BATCH_BLT macro. Previously, one could determine the state of the batch by checking brw->batch.ring (blit vs. render) and brw->batch.used != 0 (known vs. unknown). This should be functionally equivalent, but the tri-state enum is a bit clearer. v2: Catch three explicit require_space callers (thanks to Carl and Eric). v3: Split the boolean -> enum change from the UNKNOWN_RING change. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Convert brw->batch.is_blit to a BLT_RING/RENDER_RING enum.Kenneth Graunke2013-11-211-8/+7
| | | | | | | | Passing BLT_RING or RENDER_RING to batchbuffer functions is a lot more obvious than passing true or false. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/gen7: Emit workaround flush when changing GS enable state.Paul Berry2013-11-181-0/+30
| | | | | | | | | | v2: Don't go to extra work to avoid extraneous flushes. (Previous experiments in the kernel have suggested that flushing the pipeline when it is already empty is extremely cheap). Cc: "10.0" <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Rework brw_new_batch to actually start a new batch.Kenneth Graunke2013-11-151-4/+5
| | | | | | | | | | | | | | | Previously, brw_new_batch was called just after execbuf, but before intel_batchbuffer_reset. Essentially, it prepared for the creation of a new batch, that wasn't yet available, and which it didn't create. This was a bit awkward. This patch makes brw_new_batch call intel_batchbuffer_reset as the very first operation. This means that brw_new_batch actually creates a new batchbuffer, and thus has it available. It brings the creation of the new batchbuffer and BRW_NEW_BATCH flagging together into one place. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Move cache_used_by_gpu flag setting to brw_finish_batch.Kenneth Graunke2013-11-151-6/+6
| | | | | | | It really makes more sense here. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Un-virtualize brw_new_batch().Kenneth Graunke2013-10-171-1/+42
| | | | | | | | Since the i915/i965 split, there's only one implementation of this virtual function. We may as well just call it directly. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Un-virtualize brw_finish_batch().Kenneth Graunke2013-10-171-2/+22
| | | | | | | | Since the i915/i965 split, there's only one implementation of this virtual function. We may as well just call it directly. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Move need_workaround_flush = true to intel_batchbuffer_init.Kenneth Graunke2013-10-131-0/+2
| | | | | | | | intel_batchbuffer_init() sets up initial batchbuffer state; it seems like a reasonable place to initialize this flag. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Add missing state reset at the end of blorp.Eric Anholt2013-08-301-5/+5
| | | | | | | | | | | | | These are things that happen to be occurring because of the batch flush at the start of the blorp op (which exists to prevent batch space or aperture space overflow), but the intention was for this sequence of state resets at the end of blorp to be everything necessary for the next draw call. Found when debugging the next commit, by comparing brw_new_batch() and intel_batchbuffer_reset() to brw_blorp_exec(). Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: Dump more information about batch buffer usage.Kenneth Graunke2013-08-161-3/+10
| | | | | | | | | | | | | | | | | | | | Previously, INTEL_DEBUG=bat would dump messages like: intel_mipmap_tree.c:1643: Batchbuffer flush with 456b used This only reported the space used for command packets, and didn't report any information on the space used for indirect state. Now it dumps: intel_context.c:366: Batchbuffer flush with 6128b (pkt) + 4288b (state) = 10416b (31.8%) This conveniently shows the breakdown of space used for packets vs. state, as well as the percentage of batchbuffer space. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: Cite the Ivybridge PRM for VS PIPE_CONTROL workarounds.Kenneth Graunke2013-07-151-2/+2
| | | | Signed-off-by: Kenneth Graunke <[email protected]>
* i965: Move intel_context::gen and gt fields to brw_context.Kenneth Graunke2013-07-091-11/+6
| | | | | | | | | | Most functions no longer use intel_context, so this patch additionally removes the local "intel" variables to avoid compiler warnings. Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chris Forbes <[email protected]> Acked-by: Paul Berry <[email protected]> Acked-by: Anuj Phogat <[email protected]>
* i965: Move intel_context::has_llc to brw_context.Kenneth Graunke2013-07-091-4/+3
| | | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chris Forbes <[email protected]> Acked-by: Paul Berry <[email protected]> Acked-by: Anuj Phogat <[email protected]>
* i965: Move intel_context::intelScreen to brw_context.Kenneth Graunke2013-07-091-3/+2
| | | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chris Forbes <[email protected]> Acked-by: Paul Berry <[email protected]> Acked-by: Anuj Phogat <[email protected]>
* i965: Move intel_context::no_batch_wrap to brw_context.Kenneth Graunke2013-07-091-2/+1
| | | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chris Forbes <[email protected]> Acked-by: Paul Berry <[email protected]> Acked-by: Anuj Phogat <[email protected]>
* i965: Move intel_context's framerate throttling fields to brw_context.Kenneth Graunke2013-07-091-3/+3
| | | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chris Forbes <[email protected]> Acked-by: Paul Berry <[email protected]> Acked-by: Anuj Phogat <[email protected]>
* i965: Move intel_context::batch to brw_context.Kenneth Graunke2013-07-091-62/+53
| | | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chris Forbes <[email protected]> Acked-by: Paul Berry <[email protected]> Acked-by: Anuj Phogat <[email protected]>
* i965: Move intel_context::hw_ctx to brw_context.Kenneth Graunke2013-07-091-2/+2
| | | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chris Forbes <[email protected]> Acked-by: Paul Berry <[email protected]> Acked-by: Anuj Phogat <[email protected]>
* i965: Move intel_context::bufmgr to brw_context.Kenneth Graunke2013-07-091-2/+2
| | | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chris Forbes <[email protected]> Acked-by: Paul Berry <[email protected]> Acked-by: Anuj Phogat <[email protected]>
* i965: Move intel_context::vtbl to brw_context.Kenneth Graunke2013-07-091-3/+3
| | | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chris Forbes <[email protected]> Acked-by: Paul Berry <[email protected]> Acked-by: Anuj Phogat <[email protected]>
* i965: Pass brw_context to functions rather than intel_context.Kenneth Graunke2013-07-091-36/+54
| | | | | | | | | | | | | | This makes brw_context available in every function that used intel_context. This makes it possible to start migrating fields from intel_context to brw_context. Surprisingly, this actually removes some code, as functions that use OUT_BATCH don't need to declare "intel"; they just use "brw." Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chris Forbes <[email protected]> Acked-by: Paul Berry <[email protected]> Acked-by: Anuj Phogat <[email protected]>
* i965: Replace #include "intel_context.h" with brw_context.h.Kenneth Graunke2013-07-091-1/+0
| | | | | | | | | | | brw_context.h includes intel_context.h, but additionally makes the brw_context structure available. Switching this allows us to start using brw_context in more places. Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chris Forbes <[email protected]> Acked-by: Paul Berry <[email protected]> Acked-by: Anuj Phogat <[email protected]>
* i965: Replace maxBatchSize variable with BATCH_SZ define.Kenneth Graunke2013-07-031-2/+2
| | | | | | | | maxBatchSize was only ever initialized to BATCH_SZ, and a few places used BATCH_SZ directly anyway. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965: Move annotate_aub out of the vtable.Kenneth Graunke2013-07-031-2/+2
| | | | | | | | brw_annotate_aub() is the only implementation of this function, so it makes sense to just call it directly. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965: Move debug_batch hook out of the vtable.Kenneth Graunke2013-07-031-2/+2
| | | | | | | | brw_debug_batch() is the only implementation of this function, so it makes sense to just call it directly. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965: Drop code checking for gen <= 3.Eric Anholt2013-06-281-5/+1
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Move the remaining intel code to the i965 directory.Eric Anholt2013-06-261-1/+560
| | | | | | | | | Now that i915's forked off, they don't need to live in a shared directory. Acked-by: Kenneth Graunke <[email protected]> Acked-by: Chad Versace <[email protected]> Acked-by: Adam Jackson <[email protected]> (and I hear second hand that idr is OK with it, too)
* [965] Convert the driver to dri_bufmgr interface and enable TTM.Eric Anholt2007-12-071-236/+1
| | | | | | | | | | | | | This is currently believed to work but be a significant performance loss. Performance recovery should be soon to follow. The dri_bo_fake_disable_backing_store() call was added to allow backing store disable like bufmgr_fake.c did, which is a significant performance win (though it's missing the no-fence-subdata part). This commit is a squash merge of the 965-ttm branch, which had some history I wanted to avoid pulling due to noisiness and brokenness at many points for git-bisecting.
* [965] Convert DBG macro to use FILE_DEBUG_FLAG like i915.Eric Anholt2007-11-191-0/+1
|
* [965] Replace various alignment code with a shared ALIGN() macro.Eric Anholt2007-10-041-2/+2
| | | | | | | | In the process, fix some alignment issues: - Scratch space allocation was aligned into units of 1KB, while the allocation wanted units of bytes, so we never allocated enough space for scratch. - GRF register count was programmed as ALIGN(val - 1, 16) / 16 instead of ALIGN(val, 16) / 16 - 1, which overcounted for val != 16n+1.