aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_vs.c
Commit message (Collapse)AuthorAgeFilesLines
* i965: Shrink Gen5 VUE map layout to be the same as Gen4.Chris Forbes2013-06-161-20/+3
| | | | | | | | | | | | | | | | | | | The PRM suggests a larger layout, mostly to support having gl_ClipDistance[] somewhere predictable for the fixed-function clipper -- but it didn't actually arrive in Gen5. Just use the same layout for both Gen4 and Gen5. No Piglit regressions. Improves performance in CS:S Video Stress Test by ~3%. V2: - Remove now-useless function for determining the SF URB read offset - Remove now-unused BRW_VARYING_SLOT_POS_DUPLICATE Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: add support for emitting gl_ClipVertexChris Forbes2013-06-071-6/+0
| | | | | | | | | | | Removes the special-case suppression of gl_ClipVertex in the VUE map. Also calculate vertex outcodes for user clip planes based on gl_ClipVertex if written; otherwise gl_Position. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Avoid recompiles for fragment clamping on non-clamping APIs.Eric Anholt2013-04-251-1/+1
| | | | | | | | | | Removes 75/78 state-dependent recompiles in GLB2.7 (the remaining 3 are due to FBO-rendering size predictions). We currently expose GL_ARB_color_buffer_float on GL core, so we may mis-predict there, but I'm about to send a patch for removing that silly extension in that case. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: split brw_vs_prog_data into generic and VS-specific parts.Paul Berry2013-04-111-19/+48
| | | | | | | | | | | This will allow the generic parts to be re-used for geometry shaders. Reviewed-by: Jordan Justen <[email protected]> v2: Put urb_read_length and urb_entry_size in the generic struct. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: split brw_vs_prog_key into generic and VS-specific parts.Paul Berry2013-04-111-23/+24
| | | | | | | | This will allow the generic parts to be re-used for geometry shaders. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: split brw_vs_compile into generic and VS-specific parts.Paul Berry2013-04-111-2/+2
| | | | | | | | This will allow the generic parts to be re-used for geometry shaders. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Remove brw_vs_prog_data pointer from brw_vs_compile.Paul Berry2013-04-111-13/+15
| | | | | | | | | | | | | | | | | | In patches that follow, we'll be splitting structs brw_vs_prog_data and brw_vs_compile into a vec4-generic base struct and a VS-specific derived struct (this will allow the vec4-generic code to be re-used for geometry shaders). Having brw_vs_compile point to brw_vs_prog_data makes it difficult to do this cleanly. Fortunately most of the functions that use brw_vs_compile (those in the vec4_visitor class) already have access to brw_vs_prog_data through a separate pointer (vec4_visitor::prog_data). So all we have to do is use that pointer consistently, and plumb prog_data through the few remaining functions that need access to it. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Generalize computation of VUE map in preparation for GS.Paul Berry2013-04-111-6/+6
| | | | | | | | | | | This patch modifies the arguments to brw_compute_vue_map() so that they no longer bake in the assumption that we are generating a VUE map for vertex shader outputs. It also makes the function non-static so that we can re-use it for geometry shader outputs. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Fix an inconsistency inb the VUE map with gl_ClipVertex on gen4/5.Eric Anholt2013-03-301-7/+11
| | | | | | | | | | | | | We are intentionally not allocating a slot for gl_ClipVertex. But by leaving the bit set in the slots_valid, the fragment shader's computation of where varyings are in urb entry coming out of the SF would be off by one. Fixes rendering in Freespace 2 SCP, and improves rendering in TF2. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62830 Tested-by: Joaquín Ignacio Aramendía <[email protected]> NOTE: This is a candidate for the 9.1 branch. Reviewed-and-tested-by: Kenneth Graunke <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: Shrink brw_vue_map struct.Paul Berry2013-03-241-0/+8
| | | | | | | | | | | | This patch changes the arrays in brw_vue_map (which only ever contain values from -1 to 58) from ints to signed chars. This reduces the size of the struct from 488 bytes to 136 bytes. Reviewed-by: Kenneth Graunke <[email protected]> v2: fix STATIC_ASSERT to use 127 instead of 128. Reviewed-by: Eric Anholt <[email protected]>
* i965: Store the geometry output VUE map in brw_context.Paul Berry2013-03-241-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | Currently, the GPU pipeline has one active VUE map in effect at any given time--the one representing the layout of vertex data coming from the vertex shader. However, when geometry shaders are added, they will have their own independent VUE map. Later pipeline stages (clip, sf, fs) will need to consult the geometry shader VUE map if a geometry shader is in use, and the vertex shader VUE map otherwise. This patch adds a new field to brw_context, vue_map_geom_out, which contains the VUE map that should be used by later pipeline stages. It also adds a new state flag, BRW_NEW_VUE_MAP_GEOM_OUT, which is signalled whenever the contents of the VUE map changes. Since we don't support geometry shaders yet, vue_map_geom_out is currently set only by the brw_vs_prog state atom. v2: Don't set vue_map_geom_out in do_vs_prog--that's redundant and possibly problematic for precompiles. Only set it in brw_upload_vs_prog. Also, make a copy instead of using a pointer--this makes it possible to detect when the VUE map hasn't changed, so we can avoid redundant state uploads. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Move brw_vs_prog_data::outputs_written into VUE map.Paul Berry2013-03-241-11/+12
| | | | | | | | | | | | | | Future patches will allow for there to be separate VUE maps when both a geometry shader and a vertex shader are in use. When this happens, we will want to have correspondingly separate outputs_written bitfields. Moving outputs_written into the VUE map will make this easy. For consistency with the terminology used in the VUE map, the bitfield is renamed to "slots_valid" in the process. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Rename BRW_VARYING_SLOT_MAX -> BRW_VARYING_SLOT_COUNT.Paul Berry2013-03-241-2/+2
| | | | | | | The new name clarifies that it represents *one more* than the maximum possible brw_varying_slot value. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Clarify nomenclature: vert_result -> varyingPaul Berry2013-03-231-9/+9
| | | | | | | | | | | | | | | | | | This patch removes the terminology "vert_result" from the i965 driver, replacing it with "varying". The old terminology, "vert_result", was confusing because (a) it referred to the enum gl_vert_result, which no longer exists (it was replaced with gl_varying_slot), and (b) it implied a vertex output, but with the advent of geometry shaders, it could be either a vertex or a geometry output, depending what shaders are in use. The generic term "varying" is less confusing. No functional change. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> v2: Whitespace fixes.
* Replace gl_vert_result enum with gl_varying_slot.Paul Berry2013-03-151-30/+30
| | | | | | | | | | | This patch makes the following search-and-replace changes: gl_vert_result -> gl_varying_slot VERT_RESULT_* -> VARYING_SLOT_* Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Tested-by: Brian Paul <[email protected]>
* i965: Make perf_debug() output to GL_ARB_debug_output in a debug context.Eric Anholt2013-03-051-10/+11
| | | | | | | | I tried to ensure that performance in the non-debug case doesn't change (we still just check one condition up front), and I think the impact is small enough in the debug context case to warrant including all of it. Reviewed-by: Jordan Justen <[email protected]>
* i965: Consign COORD_REPLACE VS hacks to Pre-Gen6.Paul Berry2013-02-201-10/+12
| | | | | | | | | | | | | | | | | | | | | | | | | Pre-Gen6, the SF thread requires exact matching between VS output slots (aka VUE slots) and FS input slots, even when the corresponding VS output slot is unused due to being overwritten by point coordinate replacement (glTexEnvi(GL_POINT_SPRITE, GL_COORD_REPLACE, GL_TRUE)). As a result, we have a special hack in the VS to ensure when any texture coordinate is subject to point coordinate replacement, it is always allocated space in the VUE, even if it isn't written to by the VS. This hack isn't needed from Gen6 onwards, since SF (Gen7: SBE) swizzling has the ability to insert the point coordinate into gl_TexCoord[] without needing a corresponding unused VUE slot. Note that no modification of SF setup code is required for this patch--get_attr_override() already does the right thing. However, we make a slight comment change to clarify why this works. In addition to eliminating unnecessary VS recompiles and saving precious URB space on Gen6+, this will save us the trouble of having to adjust this hack when we implement geometry shaders. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Support GL_FIXED and packed vertex formats natively on Haswell+.Kenneth Graunke2013-01-071-2/+8
| | | | | | | Haswell and later support the GL_FIXED and 2_10_10_10_rev vertex formats natively, and don't need shader workarounds. Reviewed-by: Eric Anholt <[email protected]>
* i965/vs: Move uses of brw_compile from do_vs_prog to brw_vs_emit.Kenneth Graunke2012-11-281-6/+2
| | | | | | | | | | | | The brw_compile structure is closely tied to the Gen4-7 hardware encoding. However, do_vs_prog is very generic: it just calls out to get a compiled program and then uploads it. This isn't ultimately where we want it, but it's a step in the right direction: it's now closer to the code generator. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965/vs: Rework memory contexts for shader compilation data.Kenneth Graunke2012-11-281-1/+1
| | | | | | | | | | | | | | During compilation, we allocate a bunch of things: the IR needs to last at least until code generation...and then the program store needs to last until after we upload the program. For simplicity's sake, just keep it all around until we upload the program. After that, it can all be freed. This will also save a lot of headaches during the upcoming refactoring. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965/vs: Pass the brw_context pointer into brw_compute_vue_map().Kenneth Graunke2012-11-281-3/+2
| | | | | | | | We used to steal it out of the brw_compile struct, but that won't be initialized in time soon (and is eventually going away). Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965/vs: Pass the brw_context pointer into vec4_visitor and do_vs_prog.Kenneth Graunke2012-11-281-1/+1
| | | | | | | | We used to steal it out of the brw_compile struct...but vec4_visitor isn't going to have one of those in the future. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: set attribute w/a bits for packed formatsChris Forbes2012-11-261-4/+26
| | | | | | | | Flag the need for various workarounds to be applied by the vertex shader. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Generalize GL_FIXED VS w/a supportChris Forbes2012-11-261-4/+5
| | | | | | | | | | | Next few patches build on this to add other workarounds for packed formats. V2: rename BRW_ATTRIB_WA_COMPONENTS to BRW_ATTRIB_WA_COMPONENT_MASK; V3 (Kayden): remove separate bit for ES3 signed normalization Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Remove unused variables after removing the old VS backend.Kenneth Graunke2012-11-011-2/+0
| | | | Fixes compiler warnings about unused variables.
* i965/vs: Remove brw_vs_compile::constant_map.Kenneth Graunke2012-11-011-17/+1
| | | | | | It was only used for the old backend. Reviewed-by: Eric Anholt <[email protected]>
* i965/vs: Replace brw_vs_emit.c with dumping code into the vec4_visitor.Kenneth Graunke2012-11-011-9/+6
| | | | | | | | | | | | | | | | | | | | Rather than having two separate backends, just create a small layer that translates the subset of Mesa IR used for ARB_vertex_program and fixed function programs to the Vec4 IR. This allows us to use the same optimization passes, code generator, register allocator as for GLSL. v2: Incorporate Eric's review comments. - Fix use of uninitialized src_swiz[] values in the SWIZZLE_ZERO/ONE case: just initialize it to 0 (.x) since the value doesn't matter (those channels get writemasked out anyway). - Properly reswizzle source register's swizzles, rather than overwriting the swizzle. - Port the old brw_vs_emit code for computing .x of the EXP2 opcode. - Update comments, removing mention of NV_vertex_program, etc. - Delete remaining #warning lines and debug comments. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/vs: Fix unit mismatch in scratch base_offset parameter.Kenneth Graunke2012-10-101-1/+1
| | | | | | | | | | | | | | | | | | | | | move_grf_array_access_to_scratch() calculates scratch buffer offsets in bytes. However, emit_scratch_read/write() expects the base_offset parameter to be measured in OWords. As a result, a shader using a scratch read/write offset greater than zero (in practice, a shader containing more than one variable in scratch) would use too large an offset, frequently exceeding the available scratch space. This patch corrects the mismatch by removing spurious conversion from OWords to bytes in move_grf_array_access_to_scratch(). This is based on a patch by Paul Berry. NOTE: This is a candidate for stable release branches. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Make the param pointer arrays for the VS dynamically sized.Eric Anholt2012-09-071-0/+33
| | | | | | | | | Saves 96MB of wasted memory in the l4d2 demo. v2: Rebase on compare func change, change brace style. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Add functions for comparing two brw_wm/vs_prog_data structs.Eric Anholt2012-09-071-0/+19
| | | | | | | | | | | | Currently, this just avoids comparing all unused parts of param[] and pull_param[], but it's a step toward getting rid of those giant statically sized arrays. v2: Actually use the new function instead of just looking at its address. This required changing the args to const pointers. (review by Kenneth) Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Set swizzle fields in the VS precompile program key.Kenneth Graunke2012-08-271-0/+11
| | | | | | | | | | | | | | | | | This fixes a regression since 76d1301e8e8e50dc962601a9977bc52148798349: I began setting SWIZZLE_XYZW for unused sampler units in the actual program keys, since this matched the FS precompile behavior. However, the VS precompile was expecting zero, so that commit made essentially every vertex shader (even those not using texturing) mismatch and need to be recompiled. Setting them in the VS precompile key solves the issue. It also is an improvement over our old behavior: previously we guessed that vertex shaders didn't use any textures at all. Now we actually look to see if the VS had any sampler uniforms and guess based on that. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/vs: Add VS program key dumping to INTEL_DEBUG=perf.Kenneth Graunke2012-08-271-0/+71
| | | | | | | Eric added support for WM key debugging. This adds it for the VS. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Add performance debug for register spilling.Eric Anholt2012-08-121-0/+4
| | | | | Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/gen6+: Add support for edge flags.Eric Anholt2012-08-091-2/+4
| | | | | | | Fixes the 3 new piglit edgeflag tests. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=40707 Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Add comment noting copy_edgeflag state dependency.Eric Anholt2012-08-091-0/+2
| | | | | | It's already in the state struct. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Add support for loading uniform buffer variables as pull constants.Eric Anholt2012-08-071-0/+5
| | | | | | | | Unlike the FS side in the previous commit, this does variable indexing just fine, using the same code as we used for other variable-indexed pull constants. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Bind UBOs as surfaces like we do for pull constants.Eric Anholt2012-08-071-1/+1
| | | | | | v2: Comment fix, drop extraneous parens (review by Kenneth) Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Allocate dummy slots for point sprites before computing VUE map.Kenneth Graunke2012-08-061-2/+2
| | | | | | | | | | | | | | Commit f0cecd43d6b6d moved the VUE map computation to be only once, at VS compile time. However, it did so in slightly the wrong place: it made the one call to brw_vue_compute_map happen right before the allocation of dummy slots for replaced point sprite coordinates, causing a different VUE map to be generated (at least on Ironlake). Fixes a regression in Piglit's point-sprite test on Ironlake. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=46489 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Move loop over texture units into brw_populate_sampler_prog_key.Kenneth Graunke2012-07-121-4/+1
| | | | | | | | | | | The whole reason I avoided this was because it might operate on a brw_vertex_program or a brw_fragment_program. However, that isn't a problem: all we need is the gl_program base type. This avoids awkwardly passing the loop counter 'i' as a parameter, simplifies both callers, and also plumbs prog in place for future use. Signed-off-by: Kenneth Graunke <[email protected]>
* i965: Split the VS binding table to a separate table.Eric Anholt2012-02-211-0/+5
| | | | | | | | This is a step toward making the samplers/binding tables reflect sampler uniform mappings instead of embedding those in the programs. No significant performance difference on the microbenchmark (n=10). Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Move VUE map computation to once at VS compile time.Eric Anholt2012-02-211-7/+9
| | | | | | | | | | With this and the previous patch, 640x480 nexuiz is running 0.169118% +/- 0.0863696% faster (n=121). On a VS state change microbenchmark, performance is increased 8.28645% +/- 0.460478% (n=52). v2: Fix CACHE_NEW_VS comment. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Make the userclip flag for the VUE map come from VS prog data.Eric Anholt2012-02-211-2/+7
| | | | | | | | This reduces recomputation of state based on non-clipping-related transform changes, and is a step toward removing VUE map recomputation. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Remove the INTEL_OLD_VS option.Kenneth Graunke2012-01-181-1/+1
| | | | | | | | | | | | | | | Now that we no longer generate Mesa IR from GLSL IR, it's impossible to use the old vertex shader backend for GLSL programs. There's simply no Mesa IR to codegen from. Any attempt to do so would result in immediate GPU hangs, presumably due to the driver uploading an empty program with no EOT message. NOTE: This is a candidate for the 8.0 branch. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Eugeni Dodonov <[email protected]>
* i965: Fix transform feedback of gl_ClipVertex.Paul Berry2012-01-051-5/+8
| | | | | | | | | | | | | | | | | | | | | | | Previously, on i965 Gen6 and above, we weren't allocating space for gl_ClipVertex in the VUE, since the VS was automatically converting it to clip distances. This prevented transform feedback from being able to capture gl_ClipVertex. This patch goes aheads and allocates space for gl_ClipVertex in the VUE on Gen6 and above. The old behavior is retained on Gen5 and below, since (a) transform feedback is not yet supported on those platforms, and (b) those platforms don't currently support gl_ClipVertex anyhow. Note: this constitutes a slight waste of VUE space for shaders that use gl_ClipVertex and don't use transform feedback to capture it. However, that seems preferable to making the VUE map (and all of the state that depends on it) dependent on transform feedback settings. Fixes Piglit test "EXT_transform_feedback/builtin-varyings gl_ClipVertex". Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add missing _NEW_TEXTURE dirty bit to brw_vs_prog state atom.Kenneth Graunke2012-01-041-0/+1
| | | | | | | | | Commit d45814c925dd6c479cfd383b9b59458fc4359cf7 totally added a data dependency on _NEW_TEXTURE, even including the comment, but didn't actually add the dirty bit. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/vs: Add texture related data to brw_vs_prog_key.Kenneth Graunke2011-12-191-0/+8
| | | | | | | | Now that this is all factored out, it's trivial to do. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* mesa: Make gl_program::InputsRead 64 bits.Mathias Fröhlich2011-11-291-2/+2
| | | | | | | | | Make gl_program::InputsRead a 64 bits bitfield. Adapt the intel and radeon driver to handle a 64 bits InputsRead value. Signed-off-by: Mathias Froehlich <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Move program compile to emit() time.Eric Anholt2011-10-291-1/+1
| | | | | | | Only 4 other prepare() functions are left, which don't rely on this. Reviewed-by: Kenneth Graunke <[email protected]> Acked-by: Paul Berry <[email protected]>
* i965: Rename (vs|wm)_max_threads to max_(vs|wm)_threads for consistency.Kenneth Graunke2011-10-251-1/+1
| | | | | | | | | The inconsistency between vs_max_threads and max_vs_entries was rather annoying. I could never seem to remember which one was reversed, which made it harder to find quickly. "Max __ Threads" seems more natural. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* intel: Convert from GLboolean to 'bool' from stdbool.h.Kenneth Graunke2011-10-181-1/+1
| | | | | | | | | | | | | | | | | I initially produced the patch using this bash command: for file in {intel,i915,i965}/*.{c,cpp,h}; do [ ! -h $file ] && sed -i 's/GLboolean/bool/g' $file && sed -i 's/GL_TRUE/true/g' $file && sed -i 's/GL_FALSE/false/g' $file; done Then I manually added #include <stdbool.h> to fix compilation errors, and converted a few functions back to GLboolean that were used in core Mesa's function pointer table to avoid "incompatible pointer" warnings. Finally, I cleaned up some whitespace issues introduced by the change. Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chad Versace <[email protected]> Acked-by: Paul Berry <[email protected]>