mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	i965/fs: Split pull parameter decision making from mechanical demoting.	Kenneth Graunke	2014-03-18	2	-33/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	move_uniform_array_access_to_pull_constants() and setup_pull_constants() both have two parts: 1. Decide which UNIFORM registers to demote to pull constants, and assign locations. 2. Mechanically rewrite the instruction stream to pull the uniform value into a temporary VGRF and use that, eliminating the UNIFORM file access. In order to support pull constants in SIMD16 mode, we will need to make decisions exactly once, but rewrite both instruction streams. Separating these two tasks will make this easier. This patch introduces a new helper, demote_pull_constants(), which takes care of rewriting the instruction stream, in both cases. For the moment, a single invocation of demote_pull_constants can't safely handle both reladdr and non-reladdr tasks, since the two callers still use different names for uniforms due to remove_dead_constants() remapping of things. So, we get an ugly boolean parameter saying which to do. This will go away. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965/fs: Record pull constant locations for all array elements.	Kenneth Graunke	2014-03-18	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	When demoting a variably indexed uniform array to pull constants, we only recorded the location for the base of the array (element 0). Recording locations for all array elements is a trivial amount of code and will make subsequent refactoring easier. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965/fs: Save push constant location information.	Kenneth Graunke	2014-03-18	3	-2/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, both move_uniform_array_access_to_pull_constants() and setup_pull_constants() maintained stack-local arrays with this information. Storing this information will allow it to be used from multiple functions, allowing us to split and move code around. We'll also eventually want to pass pull constant location information to the SIMD16 compile. Saving this information will help us do that. Unfortunately, the two functions cannot share the contents of the array just yet. remove_dead_constants() renumbers all the UNIFORM registers to be contiguous starting at zero, so the two functions talk about uniforms using different names. We can't even remap them, since move_uniform_array_access_to_pull_constants() deletes UNIFORM registers that are only accessed with reladdr, so remove_dead_constants can't even see them. This situation will improve in the next few patches. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965/fs: Delete dead code to fail compiles with SIMD16 pull parameters.	Kenneth Graunke	2014-03-18	1	-5/+0
\| \| \| \| \| \| \| \| \| \| \| \|	The SIMD8 compile will determine whether pull parameters are necessary. If so, it will set prog_data->nr_pull_params to a value greater than 0. brw_wm_fs_emit checks if nr_pull_params > 0 and skips the SIMD16 compile altogether. So, this code should never occur. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965/fs: Invalidate live intervals when demoting uniforms to pull params.	Kenneth Graunke	2014-03-14	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Normally, nothing uses live intervals at this point, so this isn't necessary. However, dump_instructions() calculates them and uses them to show register pressure. So, calling dump_instructions() in this area of the code would segfault due to the arrays being the wrong size. This is not a candidate for stable branches because it only serves to fix internal debugging code that you manually have to invoke by altering the source code or using gdb. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965/fs: Print "+reladdr" on variably-indexed uniform arrays.	Kenneth Graunke	2014-03-14	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \|	Previously, dump_instruction() would print output such as: { 2} 3: mov vgrf1:F, u0:F { 3} 4: mov vgrf7:F, u0:F { 4} 5: mov vgrf8:F, u0:F which looked like either a scalar access or perhaps a constant-indexed access of element 0, when it was really a variable index. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965: Fix register types in dump_instructions(), again.	Kenneth Graunke	2014-03-14	4	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \|	In commit e57d77280efcbfd6579a88f071426653287ef833, I fixed this for destinations in the Vec4 backend, and sources in the scalar backend. But not both types in both backends. To prevent this mess from continuing, make the reg_encoding table static, so only the disassembler can use it. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965/fs: Fix register comparisons in saturate propagation.	Kenneth Graunke	2014-03-14	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	opt_saturate_propagation_local compares scan_inst->dst.reg/reg_offset with inst->src[0].reg/reg_offset, and ensures that scan_inst->dst.file is GRF. But nothing ensured that inst->src[0].file was GRF. In the following program, this resulted in u1:F matching vgrf1:UW, and a saturate being incorrectly propagated from instruction 8 to instruction 1. { 1} 0: add vgrf0:UW, hw_reg1+8:UW, hw_reg0:V { 1} 1: add vgrf1:UW, hw_reg1+10:UW, hw_reg0:V { 1} 2: linterp vgrf6:F, hw_reg2:F, hw_reg3:F, hw_reg0:F { 2} 3: linterp vgrf27:F, hw_reg2:F, hw_reg3:F, hw_reg0+16:F { 4} 4: mov vgrf10+0.0:F, vgrf6:F { 3} 5: mov vgrf10+1.0:F, vgrf27:F { 6} 6: tex vgrf8+0.0:F, vgrf10+0.0:F { 5} 7: mov vgrf32:F, u1:F { 5} 8: mov.sat vgrf12:F, u1:F From shader-db: total instructions in shared programs: 1841932 -> 1841957 (0.00%) instructions in affected programs: 5823 -> 5848 (0.43%) I inspected two of the 25 hurt shaders, and concluded that they were both hitting this bug, and not legitimately optimized. This fixes bugs in Left 4 Dead 2 and Team Fortress 2, possibly among others. The optimization pass didn't exist in 10.0, so this is only a candidate for 10.1. Cc: "10.1" <[email protected]> Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]>
*	i965: Add support for GL_ARB_buffer_storage.	Eric Anholt	2014-03-14	2	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \|	It turns out we can allow COHERENT storage/mappings all the time, regardless of LLC vs non-LLC. It just means never using temporary mappings to avoid GPU stalls, and on non-LLC we have to use the GTT intead of CPU mappings. If we were to use CPU maps on non-LLC (which might be useful if apps end up using buffer_storage on PBO reads, to avoid WC read slowness), those would be PERSISTENT but not COHERENT, but doing that would require us driving the clflushes from userspace somehow. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Always use CPU mappings for BOs on LLC platforms.	Eric Anholt	2014-03-14	1	-1/+1
\| \| \| \| \| \| \| \|	It looks like there's no big difference for write-only workloads, but using a CPU map means that if they happen to read without having set the MAP_READ_BIT, they get 100x the performance for those reads. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Drop the system-memory temporary allocations for flush explicit.	Eric Anholt	2014-03-14	2	-52/+58
\| \| \| \| \| \| \|	While in expected usage patterns nobody will ever hit this path, doubling our bandwidth used seems like a waste, and it cost us extra code too. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Switch mapping modes for non-explicit-flush blit-temporary maps.	Eric Anholt	2014-03-14	1	-3/+3
\| \| \| \| \| \| \| \| \|	On LLC, it should always be better to use a cached mapping than the GTT. On non-LLC, it seems pretty silly to try to optimize read performance for the INVALIDATE_RANGE_BIT case. This will make the buffer_storage logic easier. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Fix build warning of unused variable	Anuj Phogat	2014-03-14	1	-2/+0
\| \| \| \| \|	Signed-off-by: Anuj Phogat <[email protected]> Tested-by: Kenneth Graunke <[email protected]>
*	Add the EGL_MESA_configless_context extension	Neil Roberts	2014-03-12	2	-12/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This extension provides a way for an application to render to multiple surfaces with different buffer formats without having to use multiple contexts. An EGLContext can be created without an EGLConfig by passing EGL_NO_CONFIG_MESA. In that case there are no restrictions on the surfaces that can be used with the context apart from that they must be using the same EGLDisplay. _mesa_initialze_context can now take a NULL gl_config which will mark the context as ‘configless’. It will memset the visual to zero in that case. Previously the i965 and i915 drivers were explicitly creating a zeroed visual whenever 0 is passed for the EGLConfig. Mesa needs to be aware that the context is configless because it affects the initial value to use for glDrawBuffer. The first time the context is bound it will set the initial value for configless contexts depending on whether the framebuffer used is double-buffered. Reviewed-by: Kristian Høgsberg <[email protected]>
*	meta: Always restore the framebuffers and current renderbuffer.	Eric Anholt	2014-03-11	3	-21/+17
\| \| \| \| \| \| \| \|	The few paths that were playing with framebuffers and renderbuffer were saving and restoring them. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	i965: Drop intel_check_front_buffer_rendering().	Eric Anholt	2014-03-11	6	-27/+0
\| \| \| \| \| \| \| \| \|	This was being applied in a subset of the places that intel_prepare_render() was called, to set the same flag that intel_prepare_render() was setting. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	i965: Drop broken front_buffer_reading/drawing optimization.	Eric Anholt	2014-03-11	5	-42/+44
\| \| \| \| \| \| \| \| \| \| \| \|	The flag wasn't getting updated correctly when the ctx->DrawBuffer or ctx->ReadBuffer changed. It usually ended up working out because most apps only have one window system framebuffer, or if they have more than one and they have any front read/drawing, they will have called glReadBuffer()/glDrawBuffer() on it when they get started on the new buffer. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	intel: When checking for updating front buffer reading, use the right fb.	Eric Anholt	2014-03-11	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	It's the ctx->ReadBuffer that gets read from, not the ctx->DrawBuffer. So, if you happened to have a ctx->ReadBuffer that was the winsys buffer, and it had previously been intel_prepare_render()ed but not invalidated since then, and you called glReadBuffer() to switch to front buffer instead of back buffer reading on the winsys fbo while your drawbuffer was a user FBO, you'd never get the front buffer's miptree fetched, and segfault. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	automake: allow only shared builds	Emil Velikov	2014-03-11	2	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Static and shared builds were possible in the good old days of static makefiles. Currently the build system does not distinguish nor does anything special when one requests a static build. Print a warning message for the packager that static builds are not supported and continue building shared libs. Currently only Debian and derivatives use static build, and they use it for building a Xlib powered libGL. This patch will only change the warning message they are seeing but the binaries produced will be identical. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Jon TURNEY <[email protected]>
*	automake: create compat symlinks only for linux systems	Emil Velikov	2014-03-11	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The primary users of these are linux developers, although it can be extended for BSD and others if needed. Fixes make install for Cygwin and OpenBSD at least. v2: - Wrap vdpau targets as well. v3: - Fold HAVE_COMPAT_SYMLINKS conditional within installlinks.mk Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=63269 Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Jon TURNEY <[email protected]> (v1) Reviewed-by: Christian König <[email protected]>
*	configure: use LIB_EXT rather than hardcoded .so	Emil Velikov	2014-03-11	1	-12/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some platforms different library extension - dll, dylib, a. Honor that when we are creating the required links. Rename LIB_EXTENSION to LIB_EXT while we're here. With libglapi linking aside, building classic drivers on non-linux platforms should be possible now. v2: Resolve conflicts. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Jon TURNEY <[email protected]>
*	automake: do not use symbols names for static glapi.la	Emil Velikov	2014-03-11	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \|	In the cases where one links against the static glapi.la there is no need to create temporary variables only to explicitly link agaist it. Instead use SHARED_GLAPI_LIB to explicitly indicate when one is building and linking with the shared glapi provider. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Jon TURNEY <[email protected]>
*	automake: use install-lib-links.mk across all classic mesa	Emil Velikov	2014-03-11	2	-13/+2
\| \| \| \| \| \| \|	Use the handy script and minimise the boilerplate in the makefiles. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Jon TURNEY <[email protected]>
*	automake: silence folder creation	Emil Velikov	2014-03-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	There is little gain in printing whenever a folder is created. v2: - Use $(AM_V_at) over @ to have control in verbose builds. Suggested by Erik Faye-Lund. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Jon TURNEY <[email protected]>
*	automake: use MKDIR_P when possible	Emil Velikov	2014-03-11	1	-1/+1
\| \| \| \| \| \| \|	Use the automake predefined macro over hardcoding mkdir -p everywhere. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Jon TURNEY <[email protected]>
*	meta: use non-ARB shader/program create/delete functions	Brian Paul	2014-03-10	2	-30/+30
\| \| \| \| \| \| \|	The non-ARB versions take GLuint ids, not GLhandleARB. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	mesa: rename MESA_FORMAT_X8Z24_UNORM -> MESA_FORMAT_X8_UINT_Z24_UNORM	Brian Paul	2014-03-10	2	-2/+2
\| \| \| \| \| \| \|	To follow the example of MESA_FORMAT_Z24_UNORM_X8_UINT. Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965/vec4: Don't fix-up scalar uniforms for 3 src instructions.	Matt Turner	2014-03-10	1	-0/+3
\| \| \| \| \| \| \| \| \| \|	Removes unnecessary MOV instructions in L4D2, TF2, Dota2, and many other Steam games. total instructions in shared programs: 1668126 -> 1657509 (-0.64%) instructions in affected programs: 242235 -> 231618 (-4.38%) Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Disassemble 3 src instructions' rep_ctrl field.	Matt Turner	2014-03-10	2	-6/+24
\| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Disassemble 3-src operands widths' correctly.	Matt Turner	2014-03-10	4	-38/+38
\| \| \| \| \| \| \|	<4,1,1> isn't a real thing. We meant <4,4,1>, i.e., each component of the whole register. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Move binding table update packets to binding table setup time.	Eric Anholt	2014-03-10	7	-39/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This keeps us from needing to reemit all the other stage state just because a surface changed. Improves unoptimized glamor x11perf -f8text by 1.10201% +/- 0.489869% (n=296). [v1] v2: - Drop binding table packets from Gen8 unit state as well. - Pass _3DSTATE_BINDING_TABLE_POINTERS_XS to brw_upload_binding_table, cutting even more code. v3: Don't forget to drop them from 3DSTATE_GS (botched refactor in v2). Signed-off-by: Eric Anholt <[email protected]> [v1] Reviewed-by: Kenneth Graunke <[email protected]> [v1] Signed-off-by: Kenneth Graunke <[email protected]> [v2, v3] Reviewed-by: Eric Anholt <[email protected]> [v3]
*	i965: Reorganize the code in brw_upload_binding_tables.	Kenneth Graunke	2014-03-10	1	-17/+18
\| \| \| \| \| \| \| \|	This makes both the empty and non-empty binding table paths exit through the bottom of the function, which gives us a place to share code. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	meta: Support GenerateMipmaps on 1DArray textures.	Kenneth Graunke	2014-03-07	1	-9/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I don't know how many people care about this case, but it's easy enough to do, so we may as well. The tricky part is that for some reason Mesa stores the number of array slices in Height, not Depth. I thought the easiest way to handle that here was to make Height = 1 (the actual height), and srcDepth = srcImage->Height. This requires some munging when calling _mesa_prepare_mipmap_level, so I created a wrapper that sorts it out for us. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	meta: Use srcWidth/Height/Depth rather than srcImage->Width and such.	Kenneth Graunke	2014-03-07	1	-3/+3
\| \| \| \| \| \| \| \| \|	This is equivalent for now, and will differ once we add 1DArray support. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	meta: Support GenerateMipmaps on 2DArray textures.	Kenneth Graunke	2014-03-07	1	-35/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is largely a matter of looping over the number of slices/layers, and not minifying depth (presumably that code exists for the unfinished 3D texture support). Normally, I would have made the loop over array slices the outermost loop. I suspect that would make it trickier to support 3D textures someday, though, so I didn't. The advantage is that we would only have one BufferData call per slice, rather than one per miplevel and slice. However, a GenerateMipmaps microbenchmark indicates that either way is basically just as fast. So I'm not sure it's worth bothering. Improves performance in a GenerateMipmaps microbenchmark by nearly 5x. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	meta: Add a 'layer' argument to bind_fbo_image().	Kenneth Graunke	2014-03-07	1	-9/+11
\| \| \| \| \| \| \| \| \| \|	For array textures and 3D textures, this represents the layer to use. Just pass 0 for now. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	meta: Refactor code for binding a texture image to the FBO.	Kenneth Graunke	2014-03-07	1	-46/+35
\| \| \| \| \| \| \| \| \| \| \|	Almost the exact same code appeared twice, and it needs to expand to handle additional texture targets. Refactor it to tidy up the code and avoid duplicating more work in the future. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	meta: Use minify() in GenerateMipmaps code.	Kenneth Graunke	2014-03-07	1	-3/+3
\| \| \| \| \| \| \| \| \|	This is what the macro is for. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	meta: Drop redundant FBO creation code in GenerateMipmaps.	Kenneth Graunke	2014-03-07	1	-4/+1
\| \| \| \| \| \| \| \| \| \|	fallback_required() already creates the FBO in order to check whether we can render to the format. So it's guaranteed to exist. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	meta: Replace GLboolean with bool in fallback_required().	Kenneth Graunke	2014-03-07	1	-7/+7
\| \| \| \| \| \| \| \| \|	This doesn't interact with the GL API, so we shouldn't use GL types. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	meta: Make _mesa_meta_check_generate_mipmap_fallback static.	Kenneth Graunke	2014-03-07	2	-8/+4
\| \| \| \| \| \| \| \| \| \|	This was only ever used in one place; there's no reason for it to be non-static. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	meta: Split GenerateMipmap() into its own file.	Kenneth Graunke	2014-03-07	3	-337/+376
\| \| \| \| \| \| \| \| \| \|	Putting the implementation of each GL function in its own file makes it much easier not to get lost. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	meta: De-static setup_texture_coords().	Kenneth Graunke	2014-03-07	2	-23/+34
\| \| \| \| \| \| \| \| \|	This will be used in multiple files soon. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	i965: Fix render-to-texture in non-FinishRenderTexture cases.	Eric Anholt	2014-03-06	7	-27/+87
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We've had several problems now with FinishRenderTexture not getting called enough, and we're ready to just give up on it ever doing what we need. In particular, an upcoming Steam title had rendering bugs that could be fixed by always_flush_cache=true. Instead of hoping Mesa core can figure out when we need to flush our caches, just track what BOs we've rendered to in a set, and when we render from a BO in that set, emit a flush and clear the set. There's some overhead to keeping this set, but most of that is just hashing the pointer -- it turns out our set never even gets very large, because cache flushes are so common (even on cairo-gl). No statistically significant performance difference in cairo-gl (n=100), despite spending ~.5% CPU in these set operations. v1: (Original patch by Eric Anholt.) v2: (Changes by Ken Graunke.) - Rebase forward from May 7th 2013 -> March 4th 2014. - Drop the FinishRenderTexture hook entirely; after rebasing the patch, the hook was just an empty function. - Move the brw_render_cache_set_clear() call from intel_batchbuffer_emit_flush() to brw_emit_pipe_control_flush(). In theory, this could catch more cases where we've flushed. - Consider stencil as a possible texturing source. v3: (changes by anholt): - Move set_clear() back to emit_mi_flush() -- it means we can drop more forced flushes from the code. In the previous location, it wouldn't have been called when we wanted pre-gen6. - Move the set clear from batch init to reset -- it should be empty at the start of every batch, since the kernel handled any inter-batch flush for us. v4: Drop the debug code in set.c that I accidentally committed. Signed-off-by: Eric Anholt <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Tested-by: Dylan Baker <[email protected]> [v2]
*	i965: Fix predicated-send-based discards with MRT.	Eric Anholt	2014-03-05	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We need the header setup to not be predicated on which pixels are undiscarded. I'm not sure originally if I had thought that the mask disable implied predicate disable, or if I had just misread the mask disable as predicate disable. Either way, I know I had spent more time thinking about this in the gen8 generator than the gen7 generator. Plus, it turns out that I had mis-implemented the "the GPU will use the predicate unless this header is present" comment, by skipping setting up the pixel mask when the header was present. Fixes GPU hangs in piglit glsl-fs-discard-mrt, Trine, Trine 2 and preusmably MLL. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75207 Tested-by: Tapani Pälli <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	mesa: remove remaining uses of _glthread_GetID()	Brian Paul	2014-03-05	2	-8/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It was really only used in the radeon driver for a debug printf. And evidently, libGL.so referenced it just to work around some sort of linker issue. This patch removes the two calls to the function and the function itself. Fixes undefined _glthread_GetID symbol in libGL reported by 'nm'. Though, the missing symbol doesn't cause any issues on my system but it does cause glxinfo to fail on one of our test systems. Reviewed-by: Jose Fonseca <[email protected]>
*	i965: Mark invariants in backend_visitor as constants	Topi Pohjolainen	2014-03-05	1	-6/+6
\| \| \| \| \|	Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
*	i965: Merge resolving of shader program source	Topi Pohjolainen	2014-03-05	11	-26/+23
\| \| \| \| \|	Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
*	i965: Merge initialisation of backend_visitor	Topi Pohjolainen	2014-03-05	4	-14/+23
\| \| \| \| \|	Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
*	i965/wm: Use resolved miptree consistently in surface setup	Topi Pohjolainen	2014-03-05	2	-13/+13
\| \| \| \| \| \| \| \| \| \| \| \|	Most of the logic refers to the local variable 'mt' directly but a few cases use 'intelObj->mt' instead. These are the same for now but will be different once stencil miptree gets used. v2 (Ian): fixed also indentation in surrounding lines Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>