summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers
Commit message (Collapse)AuthorAgeFilesLines
* i965: Change fragment input related bitfields to 64-bit.Paul Berry2013-03-155-15/+16
| | | | | | | | | | | | | | This patch updates the bitfields brw_context::wm.input_size_masks, tracker::size_masks, and brw_wm_prog_key::proj_attrib_mask, all of which are indexed by gl_frag_attrib, from 32-bit to 64-bit. This paves the way for supporting geometry shaders, and for merging the gl_frag_attrib and gl_vert_result enums. The combination of these two will require at least 55 bits in the bitfields. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Tested-by: Brian Paul <[email protected]>
* driconf: add a miscellaneous section and always_have_depth_buffer optionBrian Paul2013-03-151-0/+14
| | | | | | | This option is needed for some applications that neglect to request a depth buffer when choosing a visual/fbconfig. The Linux app Topogun is an example of this problem.
* driconf: reorder options, reformat comments, etcBrian Paul2013-03-151-60/+74
| | | | | | | Move the options into the proper section (Debug, Quality, Performance, etc). Update comments and add some whitespace to improve readability.
* i965: Make INTEL_DEBUG=shader_time use the RAW surface format.Kenneth Graunke2013-03-142-3/+3
| | | | | | | | | | | | | | Untyped Atomic Operation messages are illegal for non-RAW formats. The IVB hardware proceeds happily (after all, who cares what the format of the surface is if you're doing untyped ops on it?), but later hardware apparently doesn't. The simulator for gen7 does complain, though. v2: Rebase against updates to previous patches. (by anholt) NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Paul Berry <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]>
* i965: Specialize SURFACE_STATE creation for shader time.Kenneth Graunke2013-03-144-8/+45
| | | | | | | | | | | | | | | | | | This is basically a copy and paste of gen7_create_constant_surface, but with the parameters filled in to offer a simpler interface. It will diverge shortly. I didn't bother adding it to the vtable for now since shader time is only exposed on Gen7+. v2: Replace tabs in the new code (by anholt) Add back dropped memset() and add a comment about HSW channel selects. NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Paul Berry <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]>
* i965: Fix INTEL_DEBUG=shader_time for Haswell.Kenneth Graunke2013-03-142-4/+12
| | | | | | | | | | | | | | | Haswell's "Data Cache" data port is a single unit, but split into two SFIDs to allow for more message types without adding more bits in the message descriptor. Untyped Atomic Operations are now message 0010 in the second data cache data port, rather than 6 in the first. v2: Use the #defines from the previous commit. (by anholt) NOTE: This is a candidate for the 9.1 branch. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> (v1)
* i965: Add definitions for gen7+ data cache messages.Eric Anholt2013-03-141-0/+37
| | | | | | | | | | We were sparsely using some of these message types, but I'll just fill them all in now. It will be used for fixing shader_time on HSW. v2: Add missing MEDIA_BLOCK_READ. NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Split shader_time entries into separate cachelines.Eric Anholt2013-03-144-4/+13
| | | | | | | | | | | | | This avoids some snooping overhead between EUs processing separate shaders (so VS versus FS). Improves performance of a minecraft trace with shader_time by 28.9% +/- 18.3% (n=7), and performance of my old GLSL demo by 93.7% +/- 0.8% (n=4). v2: Add a define for the stride with a comment explaining its units and why. Reviewed-by: Kenneth Graunke <[email protected]>
* xmlpool/.gitignore: Remove 'Makefile'Matt Turner2013-03-121-1/+0
| | | | | | Handled by top level .gitignore. Reviewed-by: Eric Anholt <[email protected]>
* mesa: Replace MESA_VERSION with PACKAGE_VERSION.Matt Turner2013-03-121-1/+1
| | | | | | One fewer place to have to update. Reviewed-by: Eric Anholt <[email protected]>
* i965: Fix typo in doxygen hyperlinkChad Versace2013-03-111-1/+1
| | | | | | | | s/brw_state_upload/brw_upload_state/ Found because the link was broken. Signed-off-by: Chad Versace <[email protected]>
* i965/fs: Improve CSE performance by expiring some available expressions.Eric Anholt2013-03-111-1/+19
| | | | | | | | | | We're already walking the list, and we can easily know when something has no reason to be in the list any longer, so take a brief extra step to reduce our worst-case runtime (an oglconform test that emits the maximum instructions in a fragment program). I don't actually know what the worst-case runtime was, because it was too long and I got bored. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Improve live variables calculation performance.Eric Anholt2013-03-112-26/+32
| | | | | | | | | | | | We can execute way fewer instructions by doing our boolean manipulation on an "int" of bits at a time, while also reducing our working set size. Reduces compile time of L4D2's slowest shader from 4s to 1.1s (-72.4% +/- 0.2%, n=10) v2: Remove redundant masking (noted by Ken) Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Also do the gen4 SEND dependency workaround against other SENDs.Eric Anholt2013-03-111-9/+15
| | | | | | | | | | | | We were handling the the dependency workaround for the first written reg of a send preceding the one we're fixing up, but didn't consider the other regs. Thus if you had two sampler calls that got allocated to the same set of regs, one might, rarely, ovewrite the other. This was occurring in XBMC's GLSL shaders. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44567 NOTE: This is a candidate for the stable branches. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Switch to using sampler LD messages for uniform pull constants.Eric Anholt2013-03-114-52/+50
| | | | | | | | | When forcing the compiler to always generate pull constants instead of push constants (in order to have an easy to use testcase), improves performance of my old GLSL demo 23.3553% +/- 1.42968% (n=7). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=60866 Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Fix broken rendering in large shaders with UBO loads.Eric Anholt2013-03-111-0/+2
| | | | | | | | | | The lowering process creates a new vgrf on gen7 that should be represented in live interval analysis. As-is, it was getting a conflicting allocation with gl_FragDepth in the dolphin emulator, producing broken rendering. NOTE: This is a candidate for the 9.1 branch. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61317 Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Add a comment about about an implementation detail.Eric Anholt2013-03-111-0/+4
| | | | | | | | I was going to fix the code above like the previous commit, but we already had that covered (otherwise all our uniform access would have been broken, unlike just pull constants). Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Fix register allocation for uniform pull constants in 16-wide.Eric Anholt2013-03-111-23/+31
| | | | | | | | | | | | | | | We were allowing a compressed instruction to write a register that contained the last use of a uniform pull constant (either UBO load or push constant spillover), so it would get half its values smashed. Since we need to see the actual instruction to decide this, move the pre-gen6 pixel_x/y logic here, which should improve the performance of register allocation since virtual_grf_interferes() is called more than once per instruction. NOTE: This is a candidate for the stable branches. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61317 Reviewed-by: Kenneth Graunke <[email protected]>
* intel: Remove some unused debug flags.Eric Anholt2013-03-112-8/+0
| | | | | | | I was looking at the list to see what might be interesting to document for application developers, and it turns out some are completely dead. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Link i965_dri.so with C++ linker.Frank Henigman2013-03-081-0/+1
| | | | | | Force C++ linking of i965_dri.so by adding a dummy C++ source file. Reviewed-by: Matt Turner <[email protected]>
* dri/nouveau: fix crash in nouveau_flushJan de Groot2013-03-071-1/+2
| | | | | | https://bugs.freedesktop.org/show_bug.cgi?id=61947 Note: this is a candidate for the stable branches
* i965: Don't fill buffer with zeroes.Kenneth Graunke2013-03-061-6/+0
| | | | | | | | This was only necessary because our bounds checking was off by one, and thus we read an extra pair of values. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Fix off-by-one in query object result gathering.Kenneth Graunke2013-03-061-2/+2
| | | | | | | | | | | If we've written N pairs of values to the buffer, then last_index = N, but the values are 0 .. N-1. Thus, we need to use <, not <=. This worked anyway because we fill the buffer with zeroes, so we just added an extra (0 - 0) to our results. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* intel: Improve the matching (more formats!) for TexImage from PBOs.Eric Anholt2013-03-051-25/+3
| | | | | | | Mesa core is the place for encoding what format/type matches a mesa format, so rely on that. Reviewed-by: Chad Versace <[email protected]>
* intel: Improve the test for readpixels blit path format checking.Eric Anholt2013-03-053-37/+6
| | | | | | | | We were allowing things like copying RG1616 to a user's ARGB8888 format, while we were denying anything that wasn't ARGB8888 or RGB565. Reviewed-by: Chad Versace <[email protected]>
* intel: Fold intel_region_copy() into its one caller.Eric Anholt2013-03-053-53/+16
| | | | | | | | | This is similar code to intel_miptree_copy_slice, but the knobs are all set differently. v2: fix whitespace Reviewed-by: Chad Versace <[email protected]>
* intel: Transition intel_region_map() to being a miptree operation.Eric Anholt2013-03-055-91/+55
| | | | | | | | | | I'm trying to move us away from the region structure, and all the callers are currently dereferencing a miptree to get the region. In this change, the map_refcount is dropped. However, the bo->virtual is itself map refcounted, so that's already dealt with. Reviewed-by: Chad Versace <[email protected]>
* intel: Remove num_mapped_regions tracking.Eric Anholt2013-03-052-14/+0
| | | | | | | | The point of tracking the value was removed in February 2012 (65b096aeddd9b45ca038f44cc9adfff86c8c48b2), and this should have been removed at the same time. Reviewed-by: Chad Versace <[email protected]>
* intel: Remove the struct intel_region reuse hash table.Eric Anholt2013-03-054-39/+2
| | | | | | | | | | | | I don't see any reason for it -- it was introduced with the DRI2 invalidate work by krh in 2010 with no explanation. I suspect it was something about wanting the same drm_intel_bo struct underneath multiple openings of the BO within one process, but that's covered by libdrm at this point. As far as the struct region goes, it is not threadsafe, so multiple contexts sharing a region could have mixed up the map_count and assertion failed or worse. Reviewed-by: Chad Versace <[email protected]>
* intel: Add missing perf debug for a stall on mapping a BO.Eric Anholt2013-03-051-0/+2
| | | | | | | | I was testing the ARB_debug_output code and wrote an obvious sample that should have hit this, and got confused that my ARB_debug_output was broken. Reviewed-by: Jordan Justen <[email protected]>
* i965: Make perf_debug() output to GL_ARB_debug_output in a debug context.Eric Anholt2013-03-0516-48/+83
| | | | | | | | I tried to ensure that performance in the non-debug case doesn't change (we still just check one condition up front), and I think the impact is small enough in the debug context case to warrant including all of it. Reviewed-by: Jordan Justen <[email protected]>
* intel: Finish renaming fallback_debug() to perf_debug().Eric Anholt2013-03-055-18/+13
| | | | | | | They're about to change to handle GL_ARB_debug_output, so just make one function. Reviewed-by: Jordan Justen <[email protected]>
* intel: Hook up the WARN_ONCE macro to GL_ARB_debug_output.Eric Anholt2013-03-054-0/+8
| | | | | | | | | | This doesn't provide detailed error type information, but it's important to get these relatively severe but rare error messages out to the developer through whatever mechanism they are using. v2: Rebase on new WARN_ONCE additions. Reviewed-by: Jordan Justen <[email protected]> (v1)
* dri/nouveau: NV17_3D class is not available for NV1a chipsetMarcin Slusarz2013-03-051-1/+1
| | | | | | | | Should fix https://bugs.freedesktop.org/show_bug.cgi?id=60510 Note: this is a candidate for the stable branches Acked-by: Francisco Jerez <[email protected]>
* i965: Fix Crystal Well PCI IDs.Kenneth Graunke2013-03-031-9/+9
| | | | | | | | | The second digit was off by one, which meant we accidentally treated GTn as GT(n-1). This also meant no support for GT1 at all. NOTE: This is a candidate for stable branches. Signed-off-by: Kenneth Graunke <[email protected]>
* i965: Pull query BO reallocation out into a helper function.Kenneth Graunke2013-03-011-23/+33
| | | | | | | | | We'll want to reuse this for non-occlusion queries in the future. Plus, it's a single logical task, so having it as a helper function clarifies the code somewhat. Signed-off-by: Kenneth Graunke <[email protected]>
* i965: Replace the global brw->query.bo variable with query->bo.Kenneth Graunke2013-03-012-16/+7
| | | | | | | | | | Again, eliminating a global variable in favor of a per-query object variable will help in a future where we have more queries in hardware. Personally, I find this clearer: there's just the query object's BO, rather than two variables that usually shadow each other. Signed-off-by: Kenneth Graunke <[email protected]>
* i965: Turn if (query->bo) into an assertion.Kenneth Graunke2013-03-011-5/+5
| | | | | | | The code a few lines above calls brw_emit_query_begin() if !query->bo, and that creates query->bo. So it should always be non-NULL. Signed-off-by: Kenneth Graunke <[email protected]>
* i965: Unify query object BO reallocation code.Kenneth Graunke2013-03-011-11/+10
| | | | | | | | | | | | | If we haven't allocated a BO yet, we need to do that. Or, if there isn't enough room to write another pair of values, we need to gather up the existing results and start a new one. This is simple enough. However, the old code was awkwardly split into two blocks, with a write_depth_count() placed in the middle. The new depth count isn't relevant to gathering the old BO's data, so that can go after the reallocation is done. With the two blocks adjacent, we can merge them. Signed-off-by: Kenneth Graunke <[email protected]>
* i965: Use query->last_index instead of the global brw->query.index.Kenneth Graunke2013-03-012-7/+6
| | | | | | | | | | | | Since we already have an index in the brw_query_object, there's no need to also keep a global variable that shadows it. Plus, if we ever add support for more types of queries that still need the per-batch before/after treatment we do for occlusion queries, we won't be able to use a single global variable. In contrast, per-query object variables will work fine. Signed-off-by: Kenneth Graunke <[email protected]>
* i965: Remove brw_query_object::first_index field as it's always 0.Kenneth Graunke2013-03-012-6/+3
| | | | | | | | | | | brw->query.index is initialized to 0 just a few lines before it's copied to first_index. Presumably the idea here was to reuse the query BO for subsequent queries of the same type, but since that doesn't happen, there's no need to have the extra code complexity. Signed-off-by: Kenneth Graunke <[email protected]>
* i965: Add a pile of comments to brw_queryobj.c.Kenneth Graunke2013-03-011-16/+143
| | | | | | | | | | | | | | | | | | | | | | | | | | This code was really difficult to follow, for a number of reasons: - Queries were handled in four different ways (TIMESTAMP writes a single value, TIME_ELAPSED writes a single pair of values, occlusion queries write pairs of values for the start and end of each batch, and other queries are done entirely in software. It turns out that there are very good reasons each query is handled the way it is, but insufficient comments explaining the rationale. - It wasn't immediately obvious which functions were driver hooks and which were helper functions. For example, brw_query_begin() is a driver hook that implements glBeginQuery() for all query types, but the similarly named brw_emit_query_begin() is a helper function that's only relevant for occlusion queries. Extra explanatory comments should save me and others from constantly having to ask how this code works and why various query types are handled differently. v2: Incorporate Eric's feedback: change "as soon as possible" to "the results will be present when mapped." Signed-off-by: Kenneth Graunke <[email protected]>
* i965: Write TIMESTAMP query values into the first buffer element.Kenneth Graunke2013-03-011-4/+3
| | | | | | | | | | For timestamp queries, we just write a single value to a BO. The natural place to write that is element 0, so we should do that. Previously, we wrote it into element 1 (the second slot) leaving element 0 filled with garbage. Signed-off-by: Kenneth Graunke <[email protected]>
* i965: Implement the new QueryCounter() hook.Kenneth Graunke2013-03-011-6/+21
| | | | | | This moves the GL_TIMESTAMP handling out of EndQuery. Signed-off-by: Kenneth Graunke <[email protected]>
* i965: enable ARB_texture_multisample on Gen6+Chris Forbes2013-03-021-0/+1
| | | | | | | | V2: Works on Ivy Bridge now too, so this can be 6+. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: add support for ir_txf_ms on Gen6+Chris Forbes2013-03-023-13/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | On Gen6, lower this to `ld` with lod=0 and an extra sample_index parameter. On Gen7, use `ld2dms`. We don't support CMS yet for multisample textures, so we just hardcode MCS=0. This is ignored for IMS and UMS surfaces. Note: If we do end up emitting specialized shaders based on the MSAA layout, we can emit a slightly shorter message here in the UMS case. Note: According to the PRM, `ld2dms` takes one more parameter, lod. However, it's always zero, and including it would make the message too long for SIMD16, so we just omit it. V2: Reworked completely, added support for Gen7. V3: - Introduce sample_index parameter rather than reusing lod - Removed spurious whitespace change - Clarify commit message V4: - Fix comment style - Emit SHADER_OPCODE_TXF_MS on Gen6. This was benignly wrong since it lowers to `ld` anyway on this gen, but still wrong. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/vs: add support for ir_txf_ms on Gen6+Chris Forbes2013-03-021-4/+21
| | | | | | | | | | | | | | | | | | On Gen6, lower this to `ld` with lod=0 and an extra sample_index parameter. On Gen7, use `ld2dms`. This takes an additional MCS parameter to support compressed multisample surfaces, but we're not enabling them for multisample textures for now, so it's always ignored and can be safely omitted. V2: Reworked completely, added support for Gen7. V3: - Use new sample_index, sample_index_type rather than reusing lod - Clarify commit message. V4: - Fix comment style Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: add a new virtual opcode: SHADER_OPCODE_TXF_MSChris Forbes2013-03-025-0/+18
| | | | | | | | | | | | | This is very similar to the TXF opcode, but lowers to `ld2dms` rather than `ld` on Gen7. V4: - add SHADER_OPCODE_TXF_MS to is_tex() functions, so regalloc thinks it actually writes the correct number of registers. Otherwise in nontrivial shaders some of the registers tend to get clobbered, producing bad results. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: take the target into account for Gen7 MSAA modesChris Forbes2013-03-021-3/+19
| | | | | | | | | | | | | | | | | | | | | Gen7 has an erratum affecting the ld_mcs message, making it unsafe to use when the surface doesn't have an associated MCS. From the Ivy Bridge PRM, Vol4 Part1 p77 ("MCS Enable"): "If this field is disabled and the sampling engine <ld_mcs> message is issued on this surface, the MCS surface may be accessed. Software must ensure that the surface is defined to avoid GTT errors." To allow the shader to treat all surfaces uniformly, force UMS if the surface is to be used as a multisample texture, even if CMS would have been possible. V3: - Quoted erratum text Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Support multisampling in surface_state for texturesChris Forbes2013-03-022-5/+6
| | | | | | | | | | | | | | | | | | | The surface_state setup for renderbuffers already worked; only the texturing side needed work. BLORP does something similar, but does its own surface_state setup. On Gen6, we just need to set the correct sample count. On Gen7: - set the correct sample count - set the correct layout mode - set GEN7_SURFACE_ARYSPC_LOD0 if it's set in the miptree. V2: - Clarify commit message - Rebased onto Paul's physical/logical dims cleanup - Added Gen7 support Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Paul Berry <[email protected]>