aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_gs.h
Commit message (Collapse)AuthorAgeFilesLines
* i965: Rename brw_compile to brw_codegenJason Ekstrand2015-04-221-1/+1
| | | | | | | | | | | | This name better matches what it's actually used for. The patch was generated with the following command: for file in *; do sed -i -e s/brw_compile/brw_codegen/g $file done Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Rename do_<stage>_prog to brw_compile_<stage>_prog (and export)Carl Worth2015-04-021-0/+7
| | | | | | | | | | | | This is in preparation for these functions to be called from other files. This commit is intended to have no functional change. It exists in preparation for some upcoming code movement in preparation for the shader cache. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Perform program state upload outside of atom handlingCarl Worth2015-02-231-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | Across the board of the various generations, the intial few atoms in all of the atom lists are basically the same, (performing uploads for the various programs). The only difference is that prior to gen6 there's an ff_gs upload in place of the later gs upload. In this commit, instead of using the atom lists for this program state upload, we add a new function brw_upload_programs that calls into the per-stage upload functions which in turn check dirty bits and return immediately if nothing needs to be done. This commit is intended to have no functional change. The motivation is that future code, (such as the shader cache), wants to have a single function within which to perform various operations before and after program upload, (with some local variables holding state across the upload). It may be worth looking at whether some of the other functionality currently handled via atoms might also be more cleanly handled in a similar fashion. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Make precompile functions accessible from C.Kenneth Graunke2014-11-241-3/+0
| | | | | | | | | | | | | Previously, the prototypes for brw_vs/gs/fs_precompile were scattered between brw_vs.h (C), brw_gs.h (C), and brw_fs.h (C++ only). Also, brw_fs_precompile had C++ linkage, while the others were C. This patch moves all the prototypes to a central location (brw_shader.h) and makes brw_fs_precompile have C linkage. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Pass gl_program pointers into precompile functions.Kenneth Graunke2014-11-241-1/+4
| | | | | | | | | | | | We'd like to do precompiling for ARB vertex and fragment programs, which only have gl_program structures - gl_shader_program is NULL. This patch makes the various precompile functions take a gl_program parameter directly, rather than accessing it via gl_shader_program. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Rename brw_vec4_gs.[ch] to brw_gs.[ch].Kenneth Graunke2014-10-291-0/+43
| | | | | | | | | | | These source files support actual geometry shaders, so using "gs" for the name makes a lot of sense. We're going to be adding SIMD8 geometry shader support as well, at which point "vec4_gs" will be a misnomer. Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Matt Turner <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Acked-by: Iago Toral Quiroga <[email protected]>
* i965: Rename brw_gs{,_emit}.[ch] to brw_ff_gs{,_emit}.[ch].Kenneth Graunke2014-10-291-115/+0
| | | | | | | | | | | | | | | | The brw_gs.[ch] and brw_gs_emit.c source files contain code for emulating fixed-function unit functionality (VF primitive decomposition or SOL) using the GS unit. They do not contain code to support proper geometry shaders. We've taken to calling that code "ff_gs" (see brw_ff_gs_prog_key, brw_ff_gs_prog_data, brw_context::ff_gs, brw_ff_gs_compile, brw_ff_gs_prog). So it makes sense to make the filenames match. Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Matt Turner <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Acked-by: Iago Toral Quiroga <[email protected]>
* i965/gen6/gs: use brw_gs_prog atom instead of brw_ff_gs_progSamuel Iglesias Gonsalvez2014-09-191-0/+1
| | | | | | | | | | | | | This is needed to support user-provided geometry shaders, since the brw_ff_gs_prog atom in gen6 only takes care of implementing transform feedback for vertex shaders. If there is no user-provided geometry shader the implementation falls back to the original code. Signed-off-by: Samuel Iglesias Gonsalvez <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* s/Tungsten Graphics/VMware/José Fonseca2014-01-171-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tungsten Graphics Inc. was acquired by VMware Inc. in 2008. Leaving the old copyright name is creating unnecessary confusion, hence this change. This was the sed script I used: $ cat tg2vmw.sed # Run as: # # git reset --hard HEAD && find include scons src -type f -not -name 'sed*' -print0 | xargs -0 sed -i -f tg2vmw.sed # # Rename copyrights s/Tungsten Gra\(ph\|hp\)ics,\? [iI]nc\.\?\(, Cedar Park\)\?\(, Austin\)\?\(, \(Texas\|TX\)\)\?\.\?/VMware, Inc./g /Copyright/s/Tungsten Graphics\(,\? [iI]nc\.\)\?\(, Cedar Park\)\?\(, Austin\)\?\(, \(Texas\|TX\)\)\?\.\?/VMware, Inc./ s/TUNGSTEN GRAPHICS/VMWARE/g # Rename emails s/[email protected]/[email protected]/ s/[email protected]/[email protected]/g s/jrfonseca-at-tungstengraphics-dot-com/jfonseca-at-vmware-dot-com/ s/jrfonseca\[email protected]/[email protected]/g s/keithw\[email protected]/[email protected]/g s/[email protected]/[email protected]/g s/thomas-at-tungstengraphics-dot-com/thellstom-at-vmware-dot-com/ s/[email protected]/[email protected]/ # Remove dead links s@Tungsten Graphics (http://www.tungstengraphics.com)@Tungsten Graphics@g # C string src/gallium/state_trackers/vega/api_misc.c s/"Tungsten Graphics, Inc"/"VMware, Inc"/ Reviewed-by: Brian Paul <[email protected]>
* i965: Drop trailing whitespace from the rest of the driver.Kenneth Graunke2013-12-051-7/+7
| | | | | | | Performed via: $ for file in *; do sed -i 's/ *//g'; done Signed-off-by: Kenneth Graunke <[email protected]>
* i965: rename legacy gs structs and functions to ff_gs.Paul Berry2013-08-311-8/+11
| | | | | | | | "ff" is for "fixed function". This frees up the name "gs" to refer to user-defined geometry shaders. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965: Remove some dead code.Kenneth Graunke2013-07-031-2/+0
| | | | | | | A random smattering of things that just aren't used anymore. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965: Handle rasterizer discard in the clipper rather than GS on Gen6.Kenneth Graunke2013-05-201-1/+0
| | | | | | | | | | | | | | | | This has more of a negative impact than the previous patch, as on Gen6 passing primitives through to the clipper means we actually have to make the GS thread write them to the URB. I don't see another good solution though, and rasterizer discard is not the most common of cases, so hopefully it won't be too terrible. v2: Add a perf_debug; resolve rebase conflicts on the brw dirty flags; remove the rasterizer_discard field from brw_gs_prog_key. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> [v1] Reviewed-by: Paul Berry <[email protected]>
* Replace gl_vert_result enum with gl_varying_slot.Paul Berry2013-03-151-1/+1
| | | | | | | | | | | This patch makes the following search-and-replace changes: gl_vert_result -> gl_varying_slot VERT_RESULT_* -> VARYING_SLOT_* Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Tested-by: Brian Paul <[email protected]>
* i965: Remove unused userclip flags.Paul Berry2013-02-191-1/+0
| | | | | | | | | | brw_vs_prog_data::userclip hasn't been used since commit f0cecd4 (i965: Move VUE map computation to once at VS compile time). brw_gs_prog_key::userclip_active hasn't been used since commit 9f3d321 (i965: Make the userclip flag for the VUE map come from VS prog data). Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Make use of gl_transform_feedback_info::ComponentOffset.Paul Berry2012-01-051-0/+7
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965 gen6: Fix transform feedback of triangle strips.Paul Berry2011-12-241-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | When rendering triangle strips, vertices come down the pipeline in the order specified, even though this causes alternate triangles to have reversed winding order. For example, if the vertices are ABCDE, then the GS is invoked on triangles ABC, BCD, and CDE, even though this means that triangle BCD is in the reverse of the normal winding order. The hardware automatically flags the triangles with reversed winding order as _3DPRIM_TRISTRIP_REVERSE, so that face culling and two-sided coloring can be adjusted to account for the reversed order. In order to ensure that winding order is correct when streaming vertices out to a transform feedback buffer, we need to alter the ordering of BCD to BDC when the first provoking vertex convention is in use, and to CBD when the last provoking vertex convention is in use. To do this, we precompute an array of indices indicating where each vertex will be placed in the transform feedback buffer; normally this is SVBI[0] + (0, 1, 2), indicating that vertex order should be preserved. When the primitive type is _3DPRIM_TRISTRIP_REVERSE, we change this order to either SVBI[0] + (0, 2, 1) or SVBI[0] + (1, 0, 2), depending on the provoking vertex convention. Fixes piglit tests "EXT_transform_feedback/tessellation triangle_strip" on Gen6. Reviewed-by: Kenneth Graunke <[email protected]>
* i965 gen6: Implement rasterizer discard.Paul Berry2011-12-201-0/+1
| | | | | | | | | | | | | | | | | | | This patch enables rasterizer discard functionality (a part of transform feedback) in Gen6, by generating an alternate GS program when rasterizer discard is active. Instead of forwarding vertices down the pipeline, the alternate GS program uses a URB Write message to deallocate the URB entry that was allocated by FF sync and terminate the thread. Note: parts of the Sandy Bridge PRM seem to imply that we could do this more efficiently, by clearing the GEN6_GS_RENDERING_ENABLE bit, and not allocating a URB entry at all. However, it's not clear how we are supposed to terminate the thread if we do that. Volume 2 part 1, section 4.5.4, says "GS threads must terminate by sending a URB_WRITE message with the EOT and Complete bits set.", and my experiments so far confirm that. Reviewed-by: Kenneth Graunke <[email protected]>
* i965 gen6: Initial implementation of transform feedback.Paul Berry2011-12-201-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds basic transform feedback capability for Gen6 hardware. This consists of several related pieces of functionality: (1) In gen6_sol.c, we set up binding table entries for use by transform feedback. We use one binding table entry per transform feedback varying (this allows us to avoid doing pointer arithmetic in the shader, since we can set up the binding table entries with the appropriate offsets and surface pitches to place each varying at the correct address). (2) In brw_context.c, we advertise the hardware capabilities, which are as follows: MAX_TRANSFORM_FEEDBACK_INTERLEAVED_COMPONENTS 64 MAX_TRANSFORM_FEEDBACK_SEPARATE_ATTRIBS 4 MAX_TRANSFORM_FEEDBACK_SEPARATE_COMPONENTS 16 OpenGL 3.0 requires these values to be at least 64, 4, and 4, respectively. The reason we advertise a larger value than required for MAX_TRANSFORM_FEEDBACK_SEPARATE_COMPONENTS is that we have already set aside 64 binding table entries, so we might as well make them all available in both separate attribs and interleaved modes. (3) We set aside a single SVBI ("streamed vertex buffer index") for use by transform feedback. The hardware supports four independent SVBI's, but we only need one, since vertices are added to all transform feedback buffers at the same rate. Note: at the moment this index is reset to 0 only when the driver is initialized. It needs to be reset to 0 whenever BeginTransformFeedback() is called, and otherwise preserved. (4) In brw_gs_emit.c and brw_gs.c, we modify the geometry shader program to output transform feedback data as a side effect. (5) In gen6_gs_state.c, we configure the geometry shader stage to handle the SVBI pointer correctly. Note: ordering of vertices is not yet correct for triangle strips (alternate triangles are improperly oriented). This will be addressed in a future patch. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965 gs: Move vue_map to brw_gs_compile.Paul Berry2011-12-201-0/+2
| | | | | | | | | | | | This patch stores the geometry shader VUE map from a local variable in compile_gs_prog() to a field in the brw_gs_compile struct, so that it will be available while compiling the geometry shader. This is necessary in order to support transform feedback on Gen6, because the Gen6 geometry shader code that supports transform feedback needs to be able to inspect the VUE map in order to find the correct vertex data to output. Reviewed-by: Kenneth Graunke <[email protected]>
* i965 gen6: Implement pass-through GS for transform feedback.Paul Berry2011-12-071-0/+2
| | | | | | | | | | | | | | | | | | | | | | In Gen6, transform feedback is accomplished by having the geometry shader send vertex data to the data port using "Streamed Vertex Buffer Write" messages, while simultaneously passing vertices through to the rest of the graphics pipeline (if rendering is enabled). This patch adds a geometry shader program that simply passes vertices through to the rest of the graphics pipeline. The rest of transform feedback functionality will be added in future patches. To make the new geometry shader easier to test, I've added an environment variable "INTEL_FORCE_GS". If this environment variable is enabled, then the pass-through geometry shader will always be used, regardless of whether transform feedback is in effect. On my Sandy Bridge laptop, I'm able to enable INTEL_FORCE_GS with no Piglit regressions. Reviewed-by: Kenneth Graunke <[email protected]> Acked-by: Eric Anholt <[email protected]>
* i965 gs: Clean up dodgy register re-use, at the cost of a few MOVs.Paul Berry2011-12-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to this patch, in the Gen4 and Gen5 GS, we used GRF 0 (called "R0" in the code) as a staging area to prepare the message header for the FF_SYNC and URB_WRITE messages. This cleverly avoided an unnecessary MOV operation (since the initial value of GRF 0 contains data that needs to be included in the message header), but it made the code confusing, since GRF 0 could no longer be relied upon to contain its initial value once the GS started preparing its first message. This patch avoids confusion by using a separate register ("header") as the staging area, at the cost of one MOV instruction. Worse yet, prior to this patch, the GS would completely overwrite the contents of GRF 0 with the writeback data it received from a completed FF_SYNC or URB_WRITE message. It did this because DWORD 0 of the writeback data contains the new URB handle, and that neds to be included in DWORD 0 of the next URB_WRITE message header. However, that caused the rest of the message header to be corrupted either with undefined data or zeros. Astonishingly, this did not produce any known failures (probably by dumb luck). However, it seems really dodgy--corrupting FFTID in particular seems likely to cause GPU hangs. This patch avoids the corruption by storing the writeback data in a temporary register and then copying just DWORD 0 to the header for the next message. This costs one extra MOV instruction per message sent, except for the final message. Also, this patch moves the logic for overriding DWORD 2 of the header (which contains PrimType, PrimStart, PrimEnd, and some other data that we don't care about yet). This logic is now in the function brw_gs_overwrite_header_dw2() rather than in brw_gs_emit_vue(). This saves one MOV instruction in brw_gs_quads() and brw_gs_quad_strip(), and paves the way for the Gen6 GS, which will need more complex logic to override DWORD 2 of the header. Finally, the function brw_gs_alloc_regs() contained a benign bug: it neglected to increment the register counter when allocating space for the "temp" register. This turned out not to have any effect because the temp register wasn't used on Gen4 and Gen5, the only hardware models (so far) to require a GS program. Now, all the registers allocated by brw_gs_alloc_regs() are actually used, and properly accounted for. Reviewed-by: Kenneth Graunke <[email protected]>
* i965 gs: Remove unnecessary mapping of key->primitive.Paul Berry2011-12-071-1/+6
| | | | | | | | | | | | | Previously, GS generation code contained a lookup table that mapped primitive types POLYGON, TRISTRIP, and TRIFAN to TRILIST, mapped LINESTRIP to LINELIST, and left all other primitives unchanged. This was silly, because we never generate a GS program for those primitive types anyhow. This patch removes the unnecessary lookup table. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Change type of brw_context.primitive from GLenum to hardware primitiveChad Versace2011-10-101-1/+1
| | | | | | | | | | | | | | | | | | | For example, GL_TRIANLGES is converted to _3DPRIM_TRILIST. The conversion is necessary because HiZ and MSAA resolve operations emit a 3DPRIM_RECTLIST, which cannot be conveyed by GLenum. As a consequence, brw_gs_prog_key.primitive is also converted. v2 ---- - [anholt] Split brw_set_prim into brw/gen6 variants in previous commit, since not much code is really shared between the two. - [anholt] Replace switch statements with table lookups, since this is a hot path. Reviewed-by: Eric Anholt <[email protected]> Signed-off-by: Chad Versace <[email protected]>
* i965: Make brw_compute_vue_map's userclip dependency a boolean.Paul Berry2011-10-061-2/+1
| | | | | | | | | | | | | Previously, brw_compute_vue_map required an argument indicating the number of clip planes in use, but all it did with it was check if it was nonzero. This patch changes brw_compute_vue_map to take a boolean instead. This allows us to avoid some unnecessary recompilation of the Gen4/5 GS and SF threads. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Remove two_side_color from brw_compute_vue_map().Paul Berry2011-09-061-2/+1
| | | | | | | | | | | Since we now lay out the VUE the same way regardless of whether two-sided color is enabled, brw_compute_vue_map() no longer needs to know whether two-sided color is enabled. This allows the two-sided color flag to be removed from the clip, GS, and VS keys, so that fewer GPU programs need to be recompiled when turning two-sided color on and off. Reviewed-by: Eric Anholt <[email protected]>
* i965: GS: Use the VUE map to compute URB size.Paul Berry2011-09-061-5/+4
| | | | | | | | | | | The previous computation had two bugs: (a) it used a formula based on Gen5 for Gen6 and Gen7 as well. (b) it failed to account for the fact that PSIZ is stored in the VUE header. Fortunately, both bugs caused it to compute a URB size that was too large, which was benign. This patch computes the URB size directly from the VUE map, so it gets the result correct in all circumstances. Reviewed-by: Eric Anholt <[email protected]>
* i965: Remove hint_gs_always and resulting dead codeIan Romanick2011-04-111-4/+1
| | | | | Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Fix GS hang on SandybridgeZhenyu Wang2010-10-141-0/+1
| | | | | | Don't use r0 for FF_SYNC dest reg on Sandybridge, which would smash FFID field in GS payload, that cause later URB write fail. Also not use r0 in any URB write requiring allocate.
* intel: Replace IS_IGDNG checks with intel->is_ironlake or needs_ff_sync.Eric Anholt2009-12-221-1/+0
| | | | Saves ~480 bytes of code.
* Merge branch 'outputswritten64'Ian Romanick2009-11-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Add a GLbitfield64 type and several macros to operate on 64-bit fields. The OutputsWritten field of gl_program is changed to use that type. This results in a fair amount of fallout in drivers that use programs. No changes are strictly necessary at this point as all bits used are below the 32-bit boundary. Fairly soon several bits will be added for clip distances written by a vertex shader. This will cause several bits used for varyings to be pushed above the 32-bit boundary. This will affect any drivers that support GLSL. At this point, only the i965 driver has been modified to support this eventuality. I did this as a "squash" merge. There were several places through the outputswritten64 branch where things were broken. I foresee this causing difficulties later for bisecting. The history is still available in the branch. Conflicts: src/mesa/drivers/dri/i965/brw_wm.h
* i965: fix EXT_provoking_vertex supportRoland Scheidegger2009-11-111-3/+4
| | | | | | | | This didn't work for quad/quadstrips at all, and for all other primitive types it only worked when they were unclipped. Fix up the former in gs stage (could probably do without these changes and instead set QuadsFollowProvokingVertexConvention to false), and the rest in clip stage.
* i965: add support for new chipsetsXiang, Haihao2009-07-131-0/+1
| | | | | | | | | | 1. new PCI ids 2. fix some 3D commands on new chipset 3. fix send instruction on new chipset 4. new VUE vertex header 5. ff_sync message (added by Zou Nan Hai <[email protected]>) 6. the offset in JMPI is in unit of 64bits on new chipset 7. new cube map layout
* Initial 965 GLSL supportZou Nan hai2007-04-121-2/+2
|
* I965: fix bug#9625-get the correct PV for quardstripXiang, Haihao2007-01-171-0/+1
| | | | | The order of vertices in payload for quardstrip is (0, 1, 3, 2), so the PV for quardstrip is c->reg.vertex[2].
* Add Intel i965G/Q DRI driver.Eric Anholt2006-08-091-0/+74
This driver comes from Tungsten Graphics, with a few further modifications by Intel.