mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	i965: Fix typos in license	Ian Romanick	2015-09-10	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	grep -lr 'sub license' \| while read f; do \ sed --in-place -e 's/sub license/sublicense/' $f ;\ done grep -lr 'NON-INFRINGEMENT' \| while read f; do \ sed --in-place -e 's/NON-INFRINGEMENT/NONINFRINGEMENT/' $f ;\ done As noted by Matt, both of these changes match the MIT license text found at http://opensource.org/licenses/MIT. Signed-off-by: Ian Romanick <[email protected]> Acked-by: Matt Turner <[email protected]>
*	i965: Remove horizontal bars from file header comments	Ian Romanick	2015-09-10	1	-4/+2
\| \| \| \| \| \| \|	Why was that ever a thing? Signed-off-by: Ian Romanick <[email protected]> Acked-by: Matt Turner <[email protected]>
*	i965: Enable hardware-generated binding tables on render path.	Abdiel Janulgue	2015-07-18	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch implements the binding table enable command which is also used to allocate a binding table pool where where hardware-generated binding table entries are flushed into. Each binding table offset in the binding table pool is unique per each shader stage that are enabled within a batch. Also insert the required brw_tracked_state objects to enable hw-generated binding tables in normal render path. v2: - Use MOCS in binding table pool alloc for GEN8 - Fix spurious offset when allocating binding table pool entry and start from zero instead. v3: - Include GEN8 fix for spurious offset above. v4: - Fixup wrong packet length in enable/disable hw-binding table for GEN8 (Ville). - Don't invoke HW-binding table disable command when we dont have resource streamer (Chris). v5: - Reorder the state cache invalidate flush so it happens in-between enabling hw-generated binding tables and the previous sw-binding table GPU state (Chris). v6: - Do the same fix in v5 for gen7_disable_hw_binding_tables(). - Adhere to coding guidelines and make comments more informative. Cc: [email protected] Cc: [email protected] Cc: [email protected] Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Abdiel Janulgue <[email protected]>
*	i965: Enable resource streamer for the batchbuffer	Abdiel Janulgue	2015-07-18	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Check first if the hardware and kernel supports resource streamer. If this is allowed, tell the kernel to enable the resource streamer enable bit on MI_BATCHBUFFER_START by specifying I915_EXEC_RESOURCE_STREAMER execbuffer flags. v2: - Use new I915_PARAM_HAS_RESOURCE_STREAMER ioctl to check if kernel supports RS (Ken). - Add brw_device_info::has_resource_streamer and toggle it for Haswell, Broadwell, Cherryview, Skylake, and Broxton (Ken). v3: - Update I915_PARAM_HAS_RESOURCE_STREAMER to match updated kernel. v4: - Always inspect the getparam.value (Chris Wilson). v5: - Fold redundant devinfo->has_resource_streamer check in context create into init screen. Cc: [email protected] Cc: [email protected] Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Abdiel Janulgue <[email protected]>
*	i965: Optimize batchbuffer macros.	Matt Turner	2015-07-15	1	-9/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously OUT_BATCH was just a macro around an inline function which does brw->batch.map[brw->batch.used++] = dword; When making consecutive calls to intel_batchbuffer_emit_dword() the compiler isn't able to recognize that we're writing consecutive memory locations or that it doesn't need to write batch.used back to memory each time. We can avoid both of these problems by making a local pointer to the next location in the batch in BEGIN_BATCH(). Cuts 18k from the .text size. text data bss dec hex filename 4946956 195152 26192 5168300 4edcac i965_dri.so before 4928956 195152 26192 5150300 4e965c i965_dri.so after This series (including commit c0433948) improves performance of Synmark OglBatch7 by 8.01389% +/- 0.63922% (n=83) on Ivybridge. Reviewed-by: Chris Wilson <[email protected]>
*	i965: Add and use USED_BATCH macro.	Matt Turner	2015-07-15	1	-10/+10
\| \| \| \| \| \| \|	The next patch will replace the .used field with an on-demand calculation of batchbuffer usage. Reviewed-by: Chris Wilson <[email protected]>
*	i965: Split batch emission from relocation functions.	Matt Turner	2015-07-15	1	-18/+12
\| \| \| \| \| \| \|	So that everything writing to the batch between BEGIN_BATCH() and ADVANCE_BATCH() goes through OUT_BATCH. Reviewed-by: Chris Wilson <[email protected]>
*	i965/hsw: Implement end of batch workaround	Ben Widawsky	2015-07-09	1	-2/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch can cause an infinite recursion if the previous patch titled, "i965: Track finished batch state" isn't present (backporters take notice). v2: Sent out the wrong patch originally. This patches switches the order of flushes, doing the generic flush before the CC_STATE, and the required workaround flush afterwards v3: Only perform workaround for render ring Add text to the BATCH_RESERVE comments v4 (By Ken): Rebase; update citation to mention PRM and Wa name; combine two blocks. http://otc-mesa-ci.jf.intel.com/job/bwidawsk/171/ Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Move pipecontrol workaround bo to brw_pipe_control	Chris Wilson	2015-07-08	1	-12/+0
\| \| \| \| \| \| \| \| \| \|	With the exception of gen8, the sole user of the workaround bo are for emitting pipe controls. Move it out of the purview of the batchbuffer and into the pipecontrol. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Martin Peres <[email protected]>
*	i965: Transplant PIPE_CONTROL routines to brw_pipe_control	Chris Wilson	2015-06-24	1	-304/+0
\| \| \| \| \| \| \| \| \|	Start trimming the fat from intel_batchbuffer.c. First by moving the set of routines for emitting PIPE_CONTROLS (along with the lore concerning hardware workarounds) to a separate brw_pipe_control.c Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i956: Add a function to load a 64-bit register from a buffer	Neil Roberts	2015-05-12	1	-14/+41
\| \| \| \| \| \| \| \| \| \| \| \|	Adds brw_load_register_mem64 which is similar to brw_load_register_mem except that it queues two GEN7_MI_LOAD_REGISTER_MEM commands in order to load both halves of a 64-bit register. The function is implemented by splitting the 32-bit version into an internal helper function which takes a size. This will later be used to set the 64-bit predicate source registers. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/state: Don't use brw->state.dirty.brw	Jordan Justen	2015-03-31	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now, we only use ctx->NewDriverState. I used this bash & sed command in the i965 directory: for file in .[ch] .[ch]pp; do sed -i -e 's/state\.dirty\.brw/ctx.NewDriverState/g' $file done Followed by manual changes to brw_state_upload.c. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Defer the throttle until we submit new commands	Chris Wilson	2015-03-18	1	-0/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, we throttle before the user begins preparing commands for the next frame when we acquire the draw/read buffers. However, construction of the command buffer can itself take significant time relative to the frame time. If we move the throttle from the buffer acquire to the command submit phase we can allow the user to improve concurrency between the CPU and GPU (i.e. reduce the amount of time we waste inside the throttle). v2: Whitespace + delay throttling until after the next submission for greater parallelism Signed-off-by: Chris Wilson <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Kenneth Graunke <[email protected]> Cc: Ben Widawsky <[email protected]> Cc: Kristian Høgsberg <[email protected]> Cc: Chad Versace <[email protected]> Cc: Ian Romanick <[email protected]> Reviewed-by: Chad Versace <[email protected]> [v1]
*	i965: Throttle to the previous frame	Chris Wilson	2015-03-18	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In order to facilitate the concurrency offered by triple buffering and to offset the latency induced by swapping via an external process, which may incur extra rendering itself, only throttle to the previous frame and not the last. The second issue that mostly affects swap benchmarks, but also can incur jitter in the throttling, is that the throttle bo is closer to the next SwapBuffers rather than immediately after the previous SwapBuffers. Throttling to the previous frame doubles the maximum possible latency at the benefit of improving throughput and reducing jitter. v2: Rename "first_post_swapbuffer" batches array to a plain throttle_batch[] as the pluralisation was contorting the name and not making it clear as to whether it was the first batch or first_post_swap batch. Not least of which was that not all throttle points are SwapBuffers. Signed-off-by: Chris Wilson <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Kenneth Graunke <[email protected]> Cc: Ben Widawsky <[email protected]> Cc: Kristian Høgsberg <[email protected]> Cc: Chad Versace <[email protected]> Cc: Ian Romanick <[email protected]> Reviewed-by: Chad Versace <[email protected]>
*	i965: Rename some PIPE_CONTROL flags	Ben Widawsky	2015-03-02	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	I'm not really sure of the origins of the existing flag names. Modern docs have some slightly different names. Having the correct names makes it easier to determine if existing PIPE_CONTROL flag settings are correct, as well as making adding new PIPE_CONTROLs easier. This originally came up while I was trying to implement workarounds and spotted some things called, "flush" which should have been called "invalidate." Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Remove hand-rolled memcpy implementation.	Matt Turner	2015-03-02	1	-1/+1
\| \| \| \|	Reviewed-by: Eric Anholt <[email protected]>
*	i965: Do Sandybridge workaround flushes before each primitive.	Kenneth Graunke	2015-02-17	1	-12/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Sandybridge requires the post-sync non-zero workaround in a ton of places, and if you ever miss one, the GPU usually hangs. Currently, we try to track exactly when a workaround flush is necessary (via the brw->batch.need_workaround_flush flag). This is tricky to get right, and we've botched it several times in the past. This patch unconditionally performs the post-sync non-zero flush at the start of each primitive's state upload (including BLORP). We drop the needs_workaround_flush flag, and drop all the other callers, as the flush has already been performed. We have no data to indicate that simply flushing all the time will hurt performance, and it has the potential to help stability. v2: Add post-sync workaround to initial GPU state upload to be extra cautious (suggested by Chad Versace). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chad Versace <[email protected]>
*	i965: Fix up too-wide comment	Kristian Høgsberg	2015-01-16	1	-4/+3
\| \| \| \|	Signed-off-by: Kristian Høgsberg <[email protected]>
*	i965: Implement WaCsStallAtEveryFourthPipecontrol on IVB/BYT.	Kenneth Graunke	2015-01-04	1	-0/+32
\| \| \| \| \| \| \| \| \| \| \| \|	According to the documentation, we need to do a CS stall on every fourth PIPE_CONTROL command to avoid GPU hangs. The kernel does a CS stall between batches, so we only need to count the PIPE_CONTROLs in our batches. v2: Get the generation check right (caught by Chris Wilson), combine the ++ with the check (suggested by Daniel Vetter). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Daniel Vetter <[email protected]>
*	i965/skl: Emit depth stall workaround for gen9 as well	Damien Lespiau	2014-12-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	The docs say that we shouldn't need this workaround for gen8+, but just removing it, causes gpu hangs. We'll revisit this, but for now, just extend the workaround to gen9. Signed-off-by: Damien Lespiau <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965/skl: Implement workaround for VF Invalidate issue	Jordan Justen	2014-11-03	1	-0/+9
\| \| \| \| \| \| \|	Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
*	Revert 5 i965 patches: 8e27a4d2, 373143ed, c5bdf9be, 6f56e142, 88e3d404	Jordan Justen	2014-09-04	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reverts * "i965: Modify state upload to allow 2 different sets of state atoms." 8e27a4d2b3e4e74e9a77446bce49607433d86be3 * "i965: Modify dirty bit handling to support 2 pipelines." 373143ed9187c4d4ce1e3c486b5dd0880d18ec8b * "i965: Create a macro for checking a dirty bit." c5bdf9be1eca190417998d548fd140c1eca37a54 Conflicts: src/mesa/drivers/dri/i965/brw_context.h * "i965: Create a macro for setting all dirty bits." 6f56e1424d923fd80c84090fbf4506c9eaaffea1 Conflicts: src/mesa/drivers/dri/i965/brw_blorp.cpp src/mesa/drivers/dri/i965/brw_state_cache.c src/mesa/drivers/dri/i965/brw_state_upload.c * "i965: Create a macro for setting a dirty bit." 88e3d404dad009d8cff5124cf8acee7daeaceb64 Signed-off-by: Jordan Justen <[email protected]>
*	i965: Create a macro for setting a dirty bit.	Paul Berry	2014-09-01	1	-2/+2
\| \| \| \| \| \| \|	This will make it easier to extend dirty bit handling to support compute shaders. Reviewed-by: Jordan Justen <[email protected]>
*	i965: Drop the memcmp for finding duplicated CURBE uploads.	Eric Anholt	2014-07-02	1	-5/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	At this point, the extra copy of the data and memcmp are as expensive as just re-uploading. Note: now that we'll always upload, and brw_constant_buffer watches BRW_NEW_BATCH anyway, we don't need to explicitly unref the old curbe_bo at batch reset time. No significant performance difference on glamor copywinwin10 (n=55), despite that test having a 98% hit rate on the cache. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Reuse intel_upload.c for gen4/5 constant buffers.	Eric Anholt	2014-07-02	1	-1/+0
\| \| \| \| \| \|	No performance difference on glamor with copywinwin10 (n=40) on my gm45. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Make batch dumping go to stderr, too.	Eric Anholt	2014-05-02	1	-0/+1
\| \| \| \| \| \|	All our other debug goes there. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Drop some more dead code from the old CACHED_BATCH feature.	Eric Anholt	2014-03-18	1	-27/+0
\| \| \| \| \|	Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Fix render-to-texture in non-FinishRenderTexture cases.	Eric Anholt	2014-03-06	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We've had several problems now with FinishRenderTexture not getting called enough, and we're ready to just give up on it ever doing what we need. In particular, an upcoming Steam title had rendering bugs that could be fixed by always_flush_cache=true. Instead of hoping Mesa core can figure out when we need to flush our caches, just track what BOs we've rendered to in a set, and when we render from a BO in that set, emit a flush and clear the set. There's some overhead to keeping this set, but most of that is just hashing the pointer -- it turns out our set never even gets very large, because cache flushes are so common (even on cairo-gl). No statistically significant performance difference in cairo-gl (n=100), despite spending ~.5% CPU in these set operations. v1: (Original patch by Eric Anholt.) v2: (Changes by Ken Graunke.) - Rebase forward from May 7th 2013 -> March 4th 2014. - Drop the FinishRenderTexture hook entirely; after rebasing the patch, the hook was just an empty function. - Move the brw_render_cache_set_clear() call from intel_batchbuffer_emit_flush() to brw_emit_pipe_control_flush(). In theory, this could catch more cases where we've flushed. - Consider stencil as a possible texturing source. v3: (changes by anholt): - Move set_clear() back to emit_mi_flush() -- it means we can drop more forced flushes from the code. In the previous location, it wouldn't have been called when we wanted pre-gen6. - Move the set clear from batch init to reset -- it should be empty at the start of every batch, since the kernel handled any inter-batch flush for us. v4: Drop the debug code in set.c that I accidentally committed. Signed-off-by: Eric Anholt <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Tested-by: Dylan Baker <[email protected]> [v2]
*	i965: Only emit VS state pipe control workaround on IVB and BYT.	Kenneth Graunke	2014-02-27	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	According to the BSpec's 3D workarounds page, this is unnecessary on shipping Haswell hardware, and was never necessary on Broadwell. It unfortunately doesn't say anything about Baytrail. The workaround database confirms those results for Ivybridge, Haswell, and Broadwell. Baytrail is less clear - one page says it's necessary, while the other says it isn't. For now, be conservative and leave it enabled. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965: Implement a CS stall workaround on Broadwell.	Kenneth Graunke	2014-02-20	1	-0/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	According to the latest documentation, any PIPE_CONTROL with the "Command Streamer Stall" bit set must also have another bit set, with five different options: - Render Target Cache Flush - Depth Cache Flush - Stall at Pixel Scoreboard - Post-Sync Operation - Depth Stall I chose "Stall at Pixel Scoreboard" since we've used it effectively in the past, but the choice is fairly arbitrary. Implementing this in the PIPE_CONTROL emit helpers ensures that the workaround will always take effect when it ought to. Apparently, this workaround may be necessary on older hardware as well; for now I've only added it to Broadwell as it's absolutely necessary there. Subsequent patches could add it to older platforms, provided someone tests it there. v2: Only flag "Stall at Pixel Scoreboard" when none of the other bits are set (suggested by Ian Romanick). v3: Prefix the function with "gen8" (requested by Eric). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> (v2) Reviewed-by: Eric Anholt <[email protected]>
*	i965: Implement a brw_load_register_mem helper function.	Kenneth Graunke	2014-02-07	1	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \|	This saves some boilerplate and hides the OUT_RELOC/OUT_RELOC64 distinction. Placing the function in intel_batchbuffer.c is rather arbitrary; there wasn't really an obvious place for it. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
*	i965: Bump generation assertions on workaround flushes.	Kenneth Graunke	2014-01-31	1	-2/+2
\| \| \| \| \| \| \| \|	I haven't investigated whether these are necessary on Broadwell or not, but for paranoia's sake, we may as well continue doing them for now. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chad Versace <[email protected]>
*	i965: Update PIPE_CONTROL packet lengths for Broadwell.	Kenneth Graunke	2014-01-20	1	-2/+20
\| \| \| \| \| \| \| \| \|	On Broadwell, PIPE_CONTROL needs an extra DWord to accomodate the 48-bit addressing. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965: Create a helper function for emitting PIPE_CONTROL writes.	Kenneth Graunke	2014-01-20	1	-38/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are a lot of places that use PIPE_CONTROL to write a value to a buffer (either an immediate write, TIMESTAMP, or PS_DEPTH_COUNT). Creating a single function to do this seems convenient. As part of this refactor, we now set the PPGTT/GTT selection bit correctly on Gen7+. Previously, we set bit 2 of DW2 on all platforms. This is correct for Sandybridge, but actually part of the address on Ivybridge and later! Broadwell will also increase the length of these packets by 1; with the refactoring, we should have to adjust that in substantially fewer places, giving us confidence that we've hit them all. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965: Use full-length PIPE_CONTROL packets for workaround writes.	Kenneth Graunke	2014-01-20	1	-6/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I believe that PIPE_CONTROL uses the length field to decide whether to do 32-bit or 64-bit writes. A length of 4 would do a 32-bit write, while a length of 5 would do a 64-bit write. (I haven't verified this, though.) For workaround writes, we don't care what value gets written, or how much data. We're only writing something because hardware bugs mandate that do so. So using a 64-bit write should be fine. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965: Emit full-length PIPE_CONTROLs for (non-write) flushes.	Kenneth Graunke	2014-01-20	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The PIPE_CONTROL packet actually has 5 DWords on Gen6+: 1. Header 2. Flags 3. Address 4. Immediate Data: Lower DWord 5. Immediate Data: Upper DWord We just never emitted the last one. While it appears to work, it's probably safer to emit the entire thing. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965: Create a helper function for emitting PIPE_CONTROL flushes.	Kenneth Graunke	2014-01-20	1	-67/+58
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These days, we need to emit PIPE_CONTROL flushes all over the place. Being able to do that via a single function call seems convenient. Broadwell will also increase the length of these packets by 1; with the refactoring, we should have to do this in substantially fewer places. v2: Add back forgotten intel_emit_post_sync_nonzero_flush (caught by Eric Anholt). Drop unlikely() from BLT_RING check. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965: Introduce an OUT_RELOC64 macro.	Kenneth Graunke	2014-01-20	1	-0/+24
\| \| \| \| \| \| \| \| \|	Broadwell uses 48-bit addresses. The first DWord is the low 32 bits, and the second DWord is the high 16 bits. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	i965: Use the new drm_intel_bo offset64 field.	Kenneth Graunke	2014-01-20	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	libdrm 2.4.52 introduces a new 'uint64_t offset64' field, intended to replace the old 'unsigned long offset' field. To preserve ABI, libdrm continues to store the presumed offset in both locations. On Broadwell, a 64-bit kernel may place BOs at "high" (> 4G) addresses. However, with a 32-bit userspace, the 'unsigned long offset' field will only be 32-bit, which is not large enough to hold this value. We need to use a proper uint64_t (like the kernel does). Technically, a lot of this code doesn't affect Broadwell, so we could leave it using the old field. But it makes sense to just switch to the new, properly typed field. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	i965: Delete intel_batchbuffer_emit_reloc_fenced.	Kenneth Graunke	2014-01-20	1	-25/+0
\| \| \| \| \| \| \| \|	Nothing in i965 uses it. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	i965: Remove CACHED_BATCH support altogether.	Kenneth Graunke	2014-01-17	1	-42/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Using an unoptimized variant of glamor spending 50% of its CPU time in brw_draw_prims() (and hitting the cache very frequently): N Min Max Median Avg Stddev x 200 29200 40500 34900 34750 958.43256 + 200 31000 40300 34700 34622 916.35941 No difference proven at 95.0% confidence Similarly, no difference on GLB2.7: N Min Max Median Avg Stddev x 63 64.1 71.36 70.69 70.113175 1.6782026 + 63 63.6 71.18 70.75 70.223651 1.6044186 No difference proven at 95.0% confidence v2: Rebase on master (by anholt) v3: Add a missing BEGIN_BATCH(3) to aa_line_parameters -- CACHED_BATCH didn't have the asserts about batchbuffer usage that ADVANCE_BATCH does, so we started assertion failing. Signed-off-by: Kenneth Graunke <[email protected]> Signed-off-by: Eric Anholt <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	s/Tungsten Graphics/VMware/	José Fonseca	2014-01-17	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Tungsten Graphics Inc. was acquired by VMware Inc. in 2008. Leaving the old copyright name is creating unnecessary confusion, hence this change. This was the sed script I used: $ cat tg2vmw.sed # Run as: # # git reset --hard HEAD && find include scons src -type f -not -name 'sed*' -print0 \| xargs -0 sed -i -f tg2vmw.sed # # Rename copyrights s/Tungsten Gra$ph\\|hp$ics,\? [iI]nc\.\?$, Cedar Park$\?$, Austin$\?$, \(Texas\\|TX$\)\?\.\?/VMware, Inc./g /Copyright/s/Tungsten Graphics$,\? [iI]nc\.$\?$, Cedar Park$\?$, Austin$\?$, \(Texas\\|TX$\)\?\.\?/VMware, Inc./ s/TUNGSTEN GRAPHICS/VMWARE/g # Rename emails s/[email protected]/[email protected]/ s/[email protected]/[email protected]/g s/jrfonseca-at-tungstengraphics-dot-com/jfonseca-at-vmware-dot-com/ s/jrfonseca\[email protected]/[email protected]/g s/keithw\[email protected]/[email protected]/g s/[email protected]/[email protected]/g s/thomas-at-tungstengraphics-dot-com/thellstom-at-vmware-dot-com/ s/[email protected]/[email protected]/ # Remove dead links s@Tungsten Graphics (http://www.tungstengraphics.com)@Tungsten Graphics@g # C string src/gallium/state_trackers/vega/api_misc.c s/"Tungsten Graphics, Inc"/"VMware, Inc"/ Reviewed-by: Brian Paul <[email protected]>
*	i965: Drop trailing whitespace from the rest of the driver.	Kenneth Graunke	2013-12-05	1	-5/+5
\| \| \| \| \| \| \|	Performed via: $ for file in ; do sed -i 's/ //g'; done Signed-off-by: Kenneth Graunke <[email protected]>
*	Revert "i965: Move brw_emit_query_begin() to the render ring prelude."	Kenneth Graunke	2013-12-03	1	-7/+1
\| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit a4bf7f6b6e612626c4e4fc21507ac213a7ba4b00. It breaks occlusion queries on Gen4-5. Doing this right will likely require larger changes, which should be done at a future date. Some Piglit tests still passed due to other bugs; fixing those revealed this problem. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965: Take "bookend" OA snapshots at the start/end of each batch.	Kenneth Graunke	2013-11-21	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unfortunately, our hardware only has one set of aggregating performance counters shared between all 3D programs, and their values are not saved or restored by hardware contexts. Also, at least on Sandybridge and Ivybridge, the counters lose their values if the GPU goes to sleep. To work around both of these problems, we have to snapshot the performance counters at the beginning and end of each batch, similar to how we handle query objects on platforms that don't support hardware contexts. I call these "bookend" snapshots. Since there can be multiple performance monitors active at a time, we store the bookend snapshots in a global BO, shared by all monitors. For monitors that span multiple batches, acquiring results involves adding up three segments: BeginPerfMonitor --> End of Batch 1 ("head") Start of Batch 2 --> End of Batch 2 ... ("middle") Start of Batch N-1 --> End of Batch N-1 Start of Batch N --> EndPerfMonitor ("tail") Monitors that refer to bookend BO snapshots are considered "unresolved". We delay resolving them (and adding up deltas to obtain the results) as long as possible to avoid blocking on mapping monitor->oa_bo. We can also run out of space in the bookend BO, at which point we have to resolve all unresolved monitors. Then we can throw away the snapshots and begin writing at the beginning of the buffer. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965: Start and stop OA counters as necessary.	Kenneth Graunke	2013-11-21	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	We need to start OA at the beginning of each batch where monitors are active. OACONTROL isn't part of the hardware context, so to avoid leaving counters enabled for other applications, we turn them off at the end of the batch too. We also need to start them at BeginPerfMonitor time (unless they've already been started). We stop them when the monitor last ends as well. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965: Periodically dump the list of monitors if INTEL_DEBUG=perfmon.	Kenneth Graunke	2013-11-21	1	-0/+3
\| \| \| \| \| \| \| \|	It's useful to see the state of all outstanding monitors; the start of a new batch seems like a reasonable time to print them out. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965: Move brw_emit_query_begin() to the render ring prelude.	Kenneth Graunke	2013-11-21	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Without hardware contexts, the pipeline statistics registers are free-running and include data from every 3D application running. In order to find out the contributions of one particular context, we need to take a snapshot at the start and end of each batch. Previously, we emitted the PIPE_CONTROL necessary to capture PS_DEPTH_COUNT when drawing primitives. Special tracking ensured it happened only on the first draw of the batch, rather than on every draw. Moving this to brw_new_batch increases symmetry, since the final snapshot has always been in brw_finish_batch, which is just a few lines below. It should be basically equivalent. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965: Introduce a "render ring prelude" hook.	Kenneth Graunke	2013-11-21	1	-0/+5
\| \| \| \| \| \| \| \| \| \|	The new intel_batchbuffer_emit_render_ring_prelude() hook will be called when switching from BLT or UNKNOWN_RING to RENDER_RING. This provides a place to emit state that should go at the start of each render ring batch, with minimal overhead. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965: Introduce an UNKNOWN_RING state.	Kenneth Graunke	2013-11-21	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we first create a batch buffer, it's empty. We don't actually know what ring it will be targeted at until the first BEGIN_BATCH or BEGIN_BATCH_BLT macro. Previously, one could determine the state of the batch by checking brw->batch.ring (blit vs. render) and brw->batch.used != 0 (known vs. unknown). This should be functionally equivalent, but the tri-state enum is a bit clearer. v2: Catch three explicit require_space callers (thanks to Carl and Eric). v3: Split the boolean -> enum change from the UNKNOWN_RING change. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>