mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	i965/state: Don't use brw->state.dirty.brw	Jordan Justen	2015-03-31	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now, we only use ctx->NewDriverState. I used this bash & sed command in the i965 directory: for file in .[ch] .[ch]pp; do sed -i -e 's/state\.dirty\.brw/ctx.NewDriverState/g' $file done Followed by manual changes to brw_state_upload.c. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Defer the throttle until we submit new commands	Chris Wilson	2015-03-18	1	-0/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, we throttle before the user begins preparing commands for the next frame when we acquire the draw/read buffers. However, construction of the command buffer can itself take significant time relative to the frame time. If we move the throttle from the buffer acquire to the command submit phase we can allow the user to improve concurrency between the CPU and GPU (i.e. reduce the amount of time we waste inside the throttle). v2: Whitespace + delay throttling until after the next submission for greater parallelism Signed-off-by: Chris Wilson <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Kenneth Graunke <[email protected]> Cc: Ben Widawsky <[email protected]> Cc: Kristian Høgsberg <[email protected]> Cc: Chad Versace <[email protected]> Cc: Ian Romanick <[email protected]> Reviewed-by: Chad Versace <[email protected]> [v1]
*	i965: Throttle to the previous frame	Chris Wilson	2015-03-18	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In order to facilitate the concurrency offered by triple buffering and to offset the latency induced by swapping via an external process, which may incur extra rendering itself, only throttle to the previous frame and not the last. The second issue that mostly affects swap benchmarks, but also can incur jitter in the throttling, is that the throttle bo is closer to the next SwapBuffers rather than immediately after the previous SwapBuffers. Throttling to the previous frame doubles the maximum possible latency at the benefit of improving throughput and reducing jitter. v2: Rename "first_post_swapbuffer" batches array to a plain throttle_batch[] as the pluralisation was contorting the name and not making it clear as to whether it was the first batch or first_post_swap batch. Not least of which was that not all throttle points are SwapBuffers. Signed-off-by: Chris Wilson <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Kenneth Graunke <[email protected]> Cc: Ben Widawsky <[email protected]> Cc: Kristian Høgsberg <[email protected]> Cc: Chad Versace <[email protected]> Cc: Ian Romanick <[email protected]> Reviewed-by: Chad Versace <[email protected]>
*	i965: Rename some PIPE_CONTROL flags	Ben Widawsky	2015-03-02	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	I'm not really sure of the origins of the existing flag names. Modern docs have some slightly different names. Having the correct names makes it easier to determine if existing PIPE_CONTROL flag settings are correct, as well as making adding new PIPE_CONTROLs easier. This originally came up while I was trying to implement workarounds and spotted some things called, "flush" which should have been called "invalidate." Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Remove hand-rolled memcpy implementation.	Matt Turner	2015-03-02	1	-1/+1
\| \| \| \|	Reviewed-by: Eric Anholt <[email protected]>
*	i965: Do Sandybridge workaround flushes before each primitive.	Kenneth Graunke	2015-02-17	1	-12/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Sandybridge requires the post-sync non-zero workaround in a ton of places, and if you ever miss one, the GPU usually hangs. Currently, we try to track exactly when a workaround flush is necessary (via the brw->batch.need_workaround_flush flag). This is tricky to get right, and we've botched it several times in the past. This patch unconditionally performs the post-sync non-zero flush at the start of each primitive's state upload (including BLORP). We drop the needs_workaround_flush flag, and drop all the other callers, as the flush has already been performed. We have no data to indicate that simply flushing all the time will hurt performance, and it has the potential to help stability. v2: Add post-sync workaround to initial GPU state upload to be extra cautious (suggested by Chad Versace). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chad Versace <[email protected]>
*	i965: Fix up too-wide comment	Kristian Høgsberg	2015-01-16	1	-4/+3
\| \| \| \|	Signed-off-by: Kristian Høgsberg <[email protected]>
*	i965: Implement WaCsStallAtEveryFourthPipecontrol on IVB/BYT.	Kenneth Graunke	2015-01-04	1	-0/+32
\| \| \| \| \| \| \| \| \| \| \| \|	According to the documentation, we need to do a CS stall on every fourth PIPE_CONTROL command to avoid GPU hangs. The kernel does a CS stall between batches, so we only need to count the PIPE_CONTROLs in our batches. v2: Get the generation check right (caught by Chris Wilson), combine the ++ with the check (suggested by Daniel Vetter). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Daniel Vetter <[email protected]>
*	i965/skl: Emit depth stall workaround for gen9 as well	Damien Lespiau	2014-12-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	The docs say that we shouldn't need this workaround for gen8+, but just removing it, causes gpu hangs. We'll revisit this, but for now, just extend the workaround to gen9. Signed-off-by: Damien Lespiau <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965/skl: Implement workaround for VF Invalidate issue	Jordan Justen	2014-11-03	1	-0/+9
\| \| \| \| \| \| \|	Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
*	Revert 5 i965 patches: 8e27a4d2, 373143ed, c5bdf9be, 6f56e142, 88e3d404	Jordan Justen	2014-09-04	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reverts * "i965: Modify state upload to allow 2 different sets of state atoms." 8e27a4d2b3e4e74e9a77446bce49607433d86be3 * "i965: Modify dirty bit handling to support 2 pipelines." 373143ed9187c4d4ce1e3c486b5dd0880d18ec8b * "i965: Create a macro for checking a dirty bit." c5bdf9be1eca190417998d548fd140c1eca37a54 Conflicts: src/mesa/drivers/dri/i965/brw_context.h * "i965: Create a macro for setting all dirty bits." 6f56e1424d923fd80c84090fbf4506c9eaaffea1 Conflicts: src/mesa/drivers/dri/i965/brw_blorp.cpp src/mesa/drivers/dri/i965/brw_state_cache.c src/mesa/drivers/dri/i965/brw_state_upload.c * "i965: Create a macro for setting a dirty bit." 88e3d404dad009d8cff5124cf8acee7daeaceb64 Signed-off-by: Jordan Justen <[email protected]>
*	i965: Create a macro for setting a dirty bit.	Paul Berry	2014-09-01	1	-2/+2
\| \| \| \| \| \| \|	This will make it easier to extend dirty bit handling to support compute shaders. Reviewed-by: Jordan Justen <[email protected]>
*	i965: Drop the memcmp for finding duplicated CURBE uploads.	Eric Anholt	2014-07-02	1	-5/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	At this point, the extra copy of the data and memcmp are as expensive as just re-uploading. Note: now that we'll always upload, and brw_constant_buffer watches BRW_NEW_BATCH anyway, we don't need to explicitly unref the old curbe_bo at batch reset time. No significant performance difference on glamor copywinwin10 (n=55), despite that test having a 98% hit rate on the cache. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Reuse intel_upload.c for gen4/5 constant buffers.	Eric Anholt	2014-07-02	1	-1/+0
\| \| \| \| \| \|	No performance difference on glamor with copywinwin10 (n=40) on my gm45. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Make batch dumping go to stderr, too.	Eric Anholt	2014-05-02	1	-0/+1
\| \| \| \| \| \|	All our other debug goes there. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Drop some more dead code from the old CACHED_BATCH feature.	Eric Anholt	2014-03-18	1	-27/+0
\| \| \| \| \|	Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Fix render-to-texture in non-FinishRenderTexture cases.	Eric Anholt	2014-03-06	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We've had several problems now with FinishRenderTexture not getting called enough, and we're ready to just give up on it ever doing what we need. In particular, an upcoming Steam title had rendering bugs that could be fixed by always_flush_cache=true. Instead of hoping Mesa core can figure out when we need to flush our caches, just track what BOs we've rendered to in a set, and when we render from a BO in that set, emit a flush and clear the set. There's some overhead to keeping this set, but most of that is just hashing the pointer -- it turns out our set never even gets very large, because cache flushes are so common (even on cairo-gl). No statistically significant performance difference in cairo-gl (n=100), despite spending ~.5% CPU in these set operations. v1: (Original patch by Eric Anholt.) v2: (Changes by Ken Graunke.) - Rebase forward from May 7th 2013 -> March 4th 2014. - Drop the FinishRenderTexture hook entirely; after rebasing the patch, the hook was just an empty function. - Move the brw_render_cache_set_clear() call from intel_batchbuffer_emit_flush() to brw_emit_pipe_control_flush(). In theory, this could catch more cases where we've flushed. - Consider stencil as a possible texturing source. v3: (changes by anholt): - Move set_clear() back to emit_mi_flush() -- it means we can drop more forced flushes from the code. In the previous location, it wouldn't have been called when we wanted pre-gen6. - Move the set clear from batch init to reset -- it should be empty at the start of every batch, since the kernel handled any inter-batch flush for us. v4: Drop the debug code in set.c that I accidentally committed. Signed-off-by: Eric Anholt <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Tested-by: Dylan Baker <[email protected]> [v2]
*	i965: Only emit VS state pipe control workaround on IVB and BYT.	Kenneth Graunke	2014-02-27	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	According to the BSpec's 3D workarounds page, this is unnecessary on shipping Haswell hardware, and was never necessary on Broadwell. It unfortunately doesn't say anything about Baytrail. The workaround database confirms those results for Ivybridge, Haswell, and Broadwell. Baytrail is less clear - one page says it's necessary, while the other says it isn't. For now, be conservative and leave it enabled. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965: Implement a CS stall workaround on Broadwell.	Kenneth Graunke	2014-02-20	1	-0/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	According to the latest documentation, any PIPE_CONTROL with the "Command Streamer Stall" bit set must also have another bit set, with five different options: - Render Target Cache Flush - Depth Cache Flush - Stall at Pixel Scoreboard - Post-Sync Operation - Depth Stall I chose "Stall at Pixel Scoreboard" since we've used it effectively in the past, but the choice is fairly arbitrary. Implementing this in the PIPE_CONTROL emit helpers ensures that the workaround will always take effect when it ought to. Apparently, this workaround may be necessary on older hardware as well; for now I've only added it to Broadwell as it's absolutely necessary there. Subsequent patches could add it to older platforms, provided someone tests it there. v2: Only flag "Stall at Pixel Scoreboard" when none of the other bits are set (suggested by Ian Romanick). v3: Prefix the function with "gen8" (requested by Eric). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> (v2) Reviewed-by: Eric Anholt <[email protected]>
*	i965: Implement a brw_load_register_mem helper function.	Kenneth Graunke	2014-02-07	1	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \|	This saves some boilerplate and hides the OUT_RELOC/OUT_RELOC64 distinction. Placing the function in intel_batchbuffer.c is rather arbitrary; there wasn't really an obvious place for it. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
*	i965: Bump generation assertions on workaround flushes.	Kenneth Graunke	2014-01-31	1	-2/+2
\| \| \| \| \| \| \| \|	I haven't investigated whether these are necessary on Broadwell or not, but for paranoia's sake, we may as well continue doing them for now. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chad Versace <[email protected]>
*	i965: Update PIPE_CONTROL packet lengths for Broadwell.	Kenneth Graunke	2014-01-20	1	-2/+20
\| \| \| \| \| \| \| \| \|	On Broadwell, PIPE_CONTROL needs an extra DWord to accomodate the 48-bit addressing. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965: Create a helper function for emitting PIPE_CONTROL writes.	Kenneth Graunke	2014-01-20	1	-38/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are a lot of places that use PIPE_CONTROL to write a value to a buffer (either an immediate write, TIMESTAMP, or PS_DEPTH_COUNT). Creating a single function to do this seems convenient. As part of this refactor, we now set the PPGTT/GTT selection bit correctly on Gen7+. Previously, we set bit 2 of DW2 on all platforms. This is correct for Sandybridge, but actually part of the address on Ivybridge and later! Broadwell will also increase the length of these packets by 1; with the refactoring, we should have to adjust that in substantially fewer places, giving us confidence that we've hit them all. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965: Use full-length PIPE_CONTROL packets for workaround writes.	Kenneth Graunke	2014-01-20	1	-6/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I believe that PIPE_CONTROL uses the length field to decide whether to do 32-bit or 64-bit writes. A length of 4 would do a 32-bit write, while a length of 5 would do a 64-bit write. (I haven't verified this, though.) For workaround writes, we don't care what value gets written, or how much data. We're only writing something because hardware bugs mandate that do so. So using a 64-bit write should be fine. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965: Emit full-length PIPE_CONTROLs for (non-write) flushes.	Kenneth Graunke	2014-01-20	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The PIPE_CONTROL packet actually has 5 DWords on Gen6+: 1. Header 2. Flags 3. Address 4. Immediate Data: Lower DWord 5. Immediate Data: Upper DWord We just never emitted the last one. While it appears to work, it's probably safer to emit the entire thing. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965: Create a helper function for emitting PIPE_CONTROL flushes.	Kenneth Graunke	2014-01-20	1	-67/+58
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These days, we need to emit PIPE_CONTROL flushes all over the place. Being able to do that via a single function call seems convenient. Broadwell will also increase the length of these packets by 1; with the refactoring, we should have to do this in substantially fewer places. v2: Add back forgotten intel_emit_post_sync_nonzero_flush (caught by Eric Anholt). Drop unlikely() from BLT_RING check. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965: Introduce an OUT_RELOC64 macro.	Kenneth Graunke	2014-01-20	1	-0/+24
\| \| \| \| \| \| \| \| \|	Broadwell uses 48-bit addresses. The first DWord is the low 32 bits, and the second DWord is the high 16 bits. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	i965: Use the new drm_intel_bo offset64 field.	Kenneth Graunke	2014-01-20	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	libdrm 2.4.52 introduces a new 'uint64_t offset64' field, intended to replace the old 'unsigned long offset' field. To preserve ABI, libdrm continues to store the presumed offset in both locations. On Broadwell, a 64-bit kernel may place BOs at "high" (> 4G) addresses. However, with a 32-bit userspace, the 'unsigned long offset' field will only be 32-bit, which is not large enough to hold this value. We need to use a proper uint64_t (like the kernel does). Technically, a lot of this code doesn't affect Broadwell, so we could leave it using the old field. But it makes sense to just switch to the new, properly typed field. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	i965: Delete intel_batchbuffer_emit_reloc_fenced.	Kenneth Graunke	2014-01-20	1	-25/+0
\| \| \| \| \| \| \| \|	Nothing in i965 uses it. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	i965: Remove CACHED_BATCH support altogether.	Kenneth Graunke	2014-01-17	1	-42/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Using an unoptimized variant of glamor spending 50% of its CPU time in brw_draw_prims() (and hitting the cache very frequently): N Min Max Median Avg Stddev x 200 29200 40500 34900 34750 958.43256 + 200 31000 40300 34700 34622 916.35941 No difference proven at 95.0% confidence Similarly, no difference on GLB2.7: N Min Max Median Avg Stddev x 63 64.1 71.36 70.69 70.113175 1.6782026 + 63 63.6 71.18 70.75 70.223651 1.6044186 No difference proven at 95.0% confidence v2: Rebase on master (by anholt) v3: Add a missing BEGIN_BATCH(3) to aa_line_parameters -- CACHED_BATCH didn't have the asserts about batchbuffer usage that ADVANCE_BATCH does, so we started assertion failing. Signed-off-by: Kenneth Graunke <[email protected]> Signed-off-by: Eric Anholt <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	s/Tungsten Graphics/VMware/	José Fonseca	2014-01-17	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Tungsten Graphics Inc. was acquired by VMware Inc. in 2008. Leaving the old copyright name is creating unnecessary confusion, hence this change. This was the sed script I used: $ cat tg2vmw.sed # Run as: # # git reset --hard HEAD && find include scons src -type f -not -name 'sed*' -print0 \| xargs -0 sed -i -f tg2vmw.sed # # Rename copyrights s/Tungsten Gra$ph\\|hp$ics,\? [iI]nc\.\?$, Cedar Park$\?$, Austin$\?$, \(Texas\\|TX$\)\?\.\?/VMware, Inc./g /Copyright/s/Tungsten Graphics$,\? [iI]nc\.$\?$, Cedar Park$\?$, Austin$\?$, \(Texas\\|TX$\)\?\.\?/VMware, Inc./ s/TUNGSTEN GRAPHICS/VMWARE/g # Rename emails s/[email protected]/[email protected]/ s/[email protected]/[email protected]/g s/jrfonseca-at-tungstengraphics-dot-com/jfonseca-at-vmware-dot-com/ s/jrfonseca\[email protected]/[email protected]/g s/keithw\[email protected]/[email protected]/g s/[email protected]/[email protected]/g s/thomas-at-tungstengraphics-dot-com/thellstom-at-vmware-dot-com/ s/[email protected]/[email protected]/ # Remove dead links s@Tungsten Graphics (http://www.tungstengraphics.com)@Tungsten Graphics@g # C string src/gallium/state_trackers/vega/api_misc.c s/"Tungsten Graphics, Inc"/"VMware, Inc"/ Reviewed-by: Brian Paul <[email protected]>
*	i965: Drop trailing whitespace from the rest of the driver.	Kenneth Graunke	2013-12-05	1	-5/+5
\| \| \| \| \| \| \|	Performed via: $ for file in ; do sed -i 's/ //g'; done Signed-off-by: Kenneth Graunke <[email protected]>
*	Revert "i965: Move brw_emit_query_begin() to the render ring prelude."	Kenneth Graunke	2013-12-03	1	-7/+1
\| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit a4bf7f6b6e612626c4e4fc21507ac213a7ba4b00. It breaks occlusion queries on Gen4-5. Doing this right will likely require larger changes, which should be done at a future date. Some Piglit tests still passed due to other bugs; fixing those revealed this problem. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965: Take "bookend" OA snapshots at the start/end of each batch.	Kenneth Graunke	2013-11-21	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unfortunately, our hardware only has one set of aggregating performance counters shared between all 3D programs, and their values are not saved or restored by hardware contexts. Also, at least on Sandybridge and Ivybridge, the counters lose their values if the GPU goes to sleep. To work around both of these problems, we have to snapshot the performance counters at the beginning and end of each batch, similar to how we handle query objects on platforms that don't support hardware contexts. I call these "bookend" snapshots. Since there can be multiple performance monitors active at a time, we store the bookend snapshots in a global BO, shared by all monitors. For monitors that span multiple batches, acquiring results involves adding up three segments: BeginPerfMonitor --> End of Batch 1 ("head") Start of Batch 2 --> End of Batch 2 ... ("middle") Start of Batch N-1 --> End of Batch N-1 Start of Batch N --> EndPerfMonitor ("tail") Monitors that refer to bookend BO snapshots are considered "unresolved". We delay resolving them (and adding up deltas to obtain the results) as long as possible to avoid blocking on mapping monitor->oa_bo. We can also run out of space in the bookend BO, at which point we have to resolve all unresolved monitors. Then we can throw away the snapshots and begin writing at the beginning of the buffer. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965: Start and stop OA counters as necessary.	Kenneth Graunke	2013-11-21	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	We need to start OA at the beginning of each batch where monitors are active. OACONTROL isn't part of the hardware context, so to avoid leaving counters enabled for other applications, we turn them off at the end of the batch too. We also need to start them at BeginPerfMonitor time (unless they've already been started). We stop them when the monitor last ends as well. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965: Periodically dump the list of monitors if INTEL_DEBUG=perfmon.	Kenneth Graunke	2013-11-21	1	-0/+3
\| \| \| \| \| \| \| \|	It's useful to see the state of all outstanding monitors; the start of a new batch seems like a reasonable time to print them out. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965: Move brw_emit_query_begin() to the render ring prelude.	Kenneth Graunke	2013-11-21	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Without hardware contexts, the pipeline statistics registers are free-running and include data from every 3D application running. In order to find out the contributions of one particular context, we need to take a snapshot at the start and end of each batch. Previously, we emitted the PIPE_CONTROL necessary to capture PS_DEPTH_COUNT when drawing primitives. Special tracking ensured it happened only on the first draw of the batch, rather than on every draw. Moving this to brw_new_batch increases symmetry, since the final snapshot has always been in brw_finish_batch, which is just a few lines below. It should be basically equivalent. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965: Introduce a "render ring prelude" hook.	Kenneth Graunke	2013-11-21	1	-0/+5
\| \| \| \| \| \| \| \| \| \|	The new intel_batchbuffer_emit_render_ring_prelude() hook will be called when switching from BLT or UNKNOWN_RING to RENDER_RING. This provides a place to emit state that should go at the start of each render ring batch, with minimal overhead. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965: Introduce an UNKNOWN_RING state.	Kenneth Graunke	2013-11-21	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we first create a batch buffer, it's empty. We don't actually know what ring it will be targeted at until the first BEGIN_BATCH or BEGIN_BATCH_BLT macro. Previously, one could determine the state of the batch by checking brw->batch.ring (blit vs. render) and brw->batch.used != 0 (known vs. unknown). This should be functionally equivalent, but the tri-state enum is a bit clearer. v2: Catch three explicit require_space callers (thanks to Carl and Eric). v3: Split the boolean -> enum change from the UNKNOWN_RING change. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965: Convert brw->batch.is_blit to a BLT_RING/RENDER_RING enum.	Kenneth Graunke	2013-11-21	1	-8/+7
\| \| \| \| \| \| \| \|	Passing BLT_RING or RENDER_RING to batchbuffer functions is a lot more obvious than passing true or false. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965/gen7: Emit workaround flush when changing GS enable state.	Paul Berry	2013-11-18	1	-0/+30
\| \| \| \| \| \| \| \| \| \|	v2: Don't go to extra work to avoid extraneous flushes. (Previous experiments in the kernel have suggested that flushing the pipeline when it is already empty is extremely cheap). Cc: "10.0" <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965: Rework brw_new_batch to actually start a new batch.	Kenneth Graunke	2013-11-15	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, brw_new_batch was called just after execbuf, but before intel_batchbuffer_reset. Essentially, it prepared for the creation of a new batch, that wasn't yet available, and which it didn't create. This was a bit awkward. This patch makes brw_new_batch call intel_batchbuffer_reset as the very first operation. This means that brw_new_batch actually creates a new batchbuffer, and thus has it available. It brings the creation of the new batchbuffer and BRW_NEW_BATCH flagging together into one place. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965: Move cache_used_by_gpu flag setting to brw_finish_batch.	Kenneth Graunke	2013-11-15	1	-6/+6
\| \| \| \| \| \| \|	It really makes more sense here. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965: Un-virtualize brw_new_batch().	Kenneth Graunke	2013-10-17	1	-1/+42
\| \| \| \| \| \| \| \|	Since the i915/i965 split, there's only one implementation of this virtual function. We may as well just call it directly. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965: Un-virtualize brw_finish_batch().	Kenneth Graunke	2013-10-17	1	-2/+22
\| \| \| \| \| \| \| \|	Since the i915/i965 split, there's only one implementation of this virtual function. We may as well just call it directly. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965: Move need_workaround_flush = true to intel_batchbuffer_init.	Kenneth Graunke	2013-10-13	1	-0/+2
\| \| \| \| \| \| \| \|	intel_batchbuffer_init() sets up initial batchbuffer state; it seems like a reasonable place to initialize this flag. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	i965: Add missing state reset at the end of blorp.	Eric Anholt	2013-08-30	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	These are things that happen to be occurring because of the batch flush at the start of the blorp op (which exists to prevent batch space or aperture space overflow), but the intention was for this sequence of state resets at the end of blorp to be everything necessary for the next draw call. Found when debugging the next commit, by comparing brw_new_batch() and intel_batchbuffer_reset() to brw_blorp_exec(). Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Paul Berry <[email protected]>
*	i965: Dump more information about batch buffer usage.	Kenneth Graunke	2013-08-16	1	-3/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, INTEL_DEBUG=bat would dump messages like: intel_mipmap_tree.c:1643: Batchbuffer flush with 456b used This only reported the space used for command packets, and didn't report any information on the space used for indirect state. Now it dumps: intel_context.c:366: Batchbuffer flush with 6128b (pkt) + 4288b (state) = 10416b (31.8%) This conveniently shows the breakdown of space used for packets vs. state, as well as the percentage of batchbuffer space. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Paul Berry <[email protected]>
*	i965: Cite the Ivybridge PRM for VS PIPE_CONTROL workarounds.	Kenneth Graunke	2013-07-15	1	-2/+2
\| \| \| \|	Signed-off-by: Kenneth Graunke <[email protected]>
*	i965: Move intel_context::gen and gt fields to brw_context.	Kenneth Graunke	2013-07-09	1	-11/+6
\| \| \| \| \| \| \| \| \| \|	Most functions no longer use intel_context, so this patch additionally removes the local "intel" variables to avoid compiler warnings. Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chris Forbes <[email protected]> Acked-by: Paul Berry <[email protected]> Acked-by: Anuj Phogat <[email protected]>