aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_gs.c
Commit message (Collapse)AuthorAgeFilesLines
* i965: Handle rasterizer discard in the clipper rather than GS on Gen6.Kenneth Graunke2013-05-201-8/+1
| | | | | | | | | | | | | | | | This has more of a negative impact than the previous patch, as on Gen6 passing primitives through to the clipper means we actually have to make the GS thread write them to the URB. I don't see another good solution though, and rasterizer discard is not the most common of cases, so hopefully it won't be too terrible. v2: Add a perf_debug; resolve rebase conflicts on the brw dirty flags; remove the rasterizer_discard field from brw_gs_prog_key. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> [v1] Reviewed-by: Paul Berry <[email protected]>
* mesa: convert _NEW_RASTERIZER_DISCARD to a driver flagMarek Olšák2013-04-241-4/+4
| | | | | Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* mesa,i965: use NewDriverState to communicate TFB state changes with the driverMarek Olšák2013-04-241-3/+3
| | | | | | | | | | | | | | | | | | | | | | | _NEW_TRANSFORM_FEEDBACK is not used by core Mesa, so it can be removed. Instead, an new private flag is added to i965 to serve the same purpose. If you're new to this: * When creating a context. you can set private dirty flags in gl_context::DriverFlags, eg.: ctx->DriverFlags.NewStateX = BRW_NEW_STATE_X; * When StateX is changed, core Mesa does: ctx->NewDriverState |= ctx->DriverFlags.NewStateX; * When you have to draw, read and clear ctx->NewDriverState. * Pros: not touching NewState, the driver decides the mapping between GL states and hw state groups, unlimited number of flags in core Mesa (still limited number of flags in the driver though) Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/vs: split brw_vs_prog_data into generic and VS-specific parts.Paul Berry2013-04-111-2/+2
| | | | | | | | | | | This will allow the generic parts to be re-used for geometry shaders. Reviewed-by: Jordan Justen <[email protected]> v2: Put urb_read_length and urb_entry_size in the generic struct. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Move brw_vs_prog_data::outputs_written into VUE map.Paul Berry2013-03-241-1/+1
| | | | | | | | | | | | | | Future patches will allow for there to be separate VUE maps when both a geometry shader and a vertex shader are in use. When this happens, we will want to have correspondingly separate outputs_written bitfields. Moving outputs_written into the VUE map will make this easy. For consistency with the terminology used in the VUE map, the bitfield is renamed to "slots_valid" in the process. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Rename BRW_VARYING_SLOT_MAX -> BRW_VARYING_SLOT_COUNT.Paul Berry2013-03-241-1/+1
| | | | | | | The new name clarifies that it represents *one more* than the maximum possible brw_varying_slot value. Reviewed-by: Kenneth Graunke <[email protected]>
* Replace gl_vert_result enum with gl_varying_slot.Paul Berry2013-03-151-1/+1
| | | | | | | | | | | This patch makes the following search-and-replace changes: gl_vert_result -> gl_varying_slot VERT_RESULT_* -> VARYING_SLOT_* Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Tested-by: Brian Paul <[email protected]>
* i965: Remove unused userclip flags.Paul Berry2013-02-191-3/+0
| | | | | | | | | | brw_vs_prog_data::userclip hasn't been used since commit f0cecd4 (i965: Move VUE map computation to once at VS compile time). brw_gs_prog_key::userclip_active hasn't been used since commit 9f3d321 (i965: Make the userclip flag for the VUE map come from VS prog data). Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Make a function is_transform_feedback_active_and_unpaused.Paul Berry2012-12-181-2/+2
| | | | | | | | | | The rather unweildy logic for determining this condition was repeated in a large number of places. This patch consolidates it to a single inline function. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/gen4: Fix memory leak each time compile_gs_prog() is called.Eric Anholt2012-11-251-1/+1
| | | | | | | | | Commit 774fb90db3e83d5e7326b7a72e05ce805c306b24 introduced a ralloc context to each user of struct brw_compile, but for this one a NULL context was used, causing the later ralloc_free(mem_ctx) to not do anything. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55175 NOTE: This is a candidate for the stable branches.
* i965: Drop the INTEL_FORCE_GS environment variable.Eric Anholt2012-03-201-5/+0
| | | | | | | | | This was a debug option during gen6 transform feedback bringup (and a similar one existed during gen4 bringup). However, it looks like we're done with that, and we don't anticipate it being used again, either for geometry shaders or transform feedback. Suggested by: Kenneth Graunke <[email protected]>
* i965: Move VUE map computation to once at VS compile time.Eric Anholt2012-02-211-1/+1
| | | | | | | | | | With this and the previous patch, 640x480 nexuiz is running 0.169118% +/- 0.0863696% faster (n=121). On a VS state change microbenchmark, performance is increased 8.28645% +/- 0.460478% (n=52). v2: Fix CACHE_NEW_VS comment. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Make the userclip flag for the VUE map come from VS prog data.Eric Anholt2012-02-211-6/+4
| | | | | | | | This reduces recomputation of state based on non-clipping-related transform changes, and is a step toward removing VUE map recomputation. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Make use of gl_transform_feedback_info::ComponentOffset.Paul Berry2012-01-051-0/+9
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965 gen6: Implement transform feedback pause/resume functionality.Paul Berry2011-12-231-1/+2
| | | | | | | | | Although i965 gen6 does not yet support ARB_transform_feedback2 or NV_transform_feedback2, it needs to support pause/resume functionality so that meta-ops will work correctly. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Add _NEW_RASTERIZER_DISCARD as synonym for _NEW_TRANSFORM.Paul Berry2011-12-211-2/+3
| | | | | | | | | | This makes it easier to keep track of which dirty bits correspond to which pieces of context, since it makes _NEW_RASTERIZER_DISCARD correspond with ctx->RasterDiscard. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* mesa: Move RasterDiscard to toplevel of gl_context.Paul Berry2011-12-211-1/+1
| | | | | | | | | | | | | | | | | | | | Previously we were storing the RasterDiscard flag (for GL_RASTERIZER_DISCARD) in gl_context::TransformFeedback. This was confusing, because we use the _NEW_TRANSFORM flag (not _NEW_TRANSFORM_FEEDBACK) to track state updates to it, and because rasterizer discard has effects even when transform feedback is not in use. This patch makes RasterDiscard a toplevel element in gl_context rather than a subfield of gl_context::TransformFeedback. Note: We can't put RasterDiscard inside gl_context::Transform, since all items inside gl_context::Transform need to be pieces of state that are saved and restored using PushAttrib and PopAttrib. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* i965 gen6: Implement rasterizer discard.Paul Berry2011-12-201-0/+6
| | | | | | | | | | | | | | | | | | | This patch enables rasterizer discard functionality (a part of transform feedback) in Gen6, by generating an alternate GS program when rasterizer discard is active. Instead of forwarding vertices down the pipeline, the alternate GS program uses a URB Write message to deallocate the URB entry that was allocated by FF sync and terminate the thread. Note: parts of the Sandy Bridge PRM seem to imply that we could do this more efficiently, by clearing the GEN6_GS_RENDERING_ENABLE bit, and not allocating a URB entry at all. However, it's not clear how we are supposed to terminate the thread if we do that. Volume 2 part 1, section 4.5.4, says "GS threads must terminate by sending a URB_WRITE message with the EOT and Complete bits set.", and my experiments so far confirm that. Reviewed-by: Kenneth Graunke <[email protected]>
* i965 gen6: Initial implementation of transform feedback.Paul Berry2011-12-201-1/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds basic transform feedback capability for Gen6 hardware. This consists of several related pieces of functionality: (1) In gen6_sol.c, we set up binding table entries for use by transform feedback. We use one binding table entry per transform feedback varying (this allows us to avoid doing pointer arithmetic in the shader, since we can set up the binding table entries with the appropriate offsets and surface pitches to place each varying at the correct address). (2) In brw_context.c, we advertise the hardware capabilities, which are as follows: MAX_TRANSFORM_FEEDBACK_INTERLEAVED_COMPONENTS 64 MAX_TRANSFORM_FEEDBACK_SEPARATE_ATTRIBS 4 MAX_TRANSFORM_FEEDBACK_SEPARATE_COMPONENTS 16 OpenGL 3.0 requires these values to be at least 64, 4, and 4, respectively. The reason we advertise a larger value than required for MAX_TRANSFORM_FEEDBACK_SEPARATE_COMPONENTS is that we have already set aside 64 binding table entries, so we might as well make them all available in both separate attribs and interleaved modes. (3) We set aside a single SVBI ("streamed vertex buffer index") for use by transform feedback. The hardware supports four independent SVBI's, but we only need one, since vertices are added to all transform feedback buffers at the same rate. Note: at the moment this index is reset to 0 only when the driver is initialized. It needs to be reset to 0 whenever BeginTransformFeedback() is called, and otherwise preserved. (4) In brw_gs_emit.c and brw_gs.c, we modify the geometry shader program to output transform feedback data as a side effect. (5) In gen6_gs_state.c, we configure the geometry shader stage to handle the SVBI pointer correctly. Note: ordering of vertices is not yet correct for triangle strips (alternate triangles are improperly oriented). This will be addressed in a future patch. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965 gs: Move vue_map to brw_gs_compile.Paul Berry2011-12-201-3/+2
| | | | | | | | | | | | This patch stores the geometry shader VUE map from a local variable in compile_gs_prog() to a field in the brw_gs_compile struct, so that it will be available while compiling the geometry shader. This is necessary in order to support transform feedback on Gen6, because the Gen6 geometry shader code that supports transform feedback needs to be able to inspect the VUE map in order to find the correct vertex data to output. Reviewed-by: Kenneth Graunke <[email protected]>
* i965 gen6: Implement pass-through GS for transform feedback.Paul Berry2011-12-071-30/+76
| | | | | | | | | | | | | | | | | | | | | | In Gen6, transform feedback is accomplished by having the geometry shader send vertex data to the data port using "Streamed Vertex Buffer Write" messages, while simultaneously passing vertices through to the rest of the graphics pipeline (if rendering is enabled). This patch adds a geometry shader program that simply passes vertices through to the rest of the graphics pipeline. The rest of transform feedback functionality will be added in future patches. To make the new geometry shader easier to test, I've added an environment variable "INTEL_FORCE_GS". If this environment variable is enabled, then the pass-through geometry shader will always be used, regardless of whether transform feedback is in effect. On my Sandy Bridge laptop, I'm able to enable INTEL_FORCE_GS with no Piglit regressions. Reviewed-by: Kenneth Graunke <[email protected]> Acked-by: Eric Anholt <[email protected]>
* i965 gs: Remove unnecessary mapping of key->primitive.Paul Berry2011-12-071-15/+1
| | | | | | | | | | | | | Previously, GS generation code contained a lookup table that mapped primitive types POLYGON, TRISTRIP, and TRIFAN to TRILIST, mapped LINESTRIP to LINELIST, and left all other primitives unchanged. This was silly, because we never generate a GS program for those primitive types anyhow. This patch removes the unnecessary lookup table. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Move program compile to emit() time.Eric Anholt2011-10-291-2/+3
| | | | | | | Only 4 other prepare() functions are left, which don't rely on this. Reviewed-by: Kenneth Graunke <[email protected]> Acked-by: Paul Berry <[email protected]>
* intel: Convert from GLboolean to 'bool' from stdbool.h.Kenneth Graunke2011-10-181-1/+1
| | | | | | | | | | | | | | | | | I initially produced the patch using this bash command: for file in {intel,i915,i965}/*.{c,cpp,h}; do [ ! -h $file ] && sed -i 's/GLboolean/bool/g' $file && sed -i 's/GL_TRUE/true/g' $file && sed -i 's/GL_FALSE/false/g' $file; done Then I manually added #include <stdbool.h> to fix compilation errors, and converted a few functions back to GLboolean that were used in core Mesa's function pointer table to avoid "incompatible pointer" warnings. Finally, I cleaned up some whitespace issues introduced by the change. Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chad Versace <[email protected]> Acked-by: Paul Berry <[email protected]>
* i965: Change type of brw_context.primitive from GLenum to hardware primitiveChad Versace2011-10-101-18/+19
| | | | | | | | | | | | | | | | | | | For example, GL_TRIANLGES is converted to _3DPRIM_TRILIST. The conversion is necessary because HiZ and MSAA resolve operations emit a 3DPRIM_RECTLIST, which cannot be conveyed by GLenum. As a consequence, brw_gs_prog_key.primitive is also converted. v2 ---- - [anholt] Split brw_set_prim into brw/gen6 variants in previous commit, since not much code is really shared between the two. - [anholt] Replace switch statements with table lookups, since this is a hot path. Reviewed-by: Eric Anholt <[email protected]> Signed-off-by: Chad Versace <[email protected]>
* i965: Make brw_compute_vue_map's userclip dependency a boolean.Paul Berry2011-10-061-2/+2
| | | | | | | | | | | | | Previously, brw_compute_vue_map required an argument indicating the number of clip planes in use, but all it did with it was check if it was nonzero. This patch changes brw_compute_vue_map to take a boolean instead. This allows us to avoid some unnecessary recompilation of the Gen4/5 GS and SF threads. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* mesa: Create _mesa_bitcount_64() to replace i965's brw_count_bits()Paul Berry2011-10-061-1/+1
| | | | | | | | | | | | | | | | The i965 driver already had a function to count bits in a 64-bit uint (brw_count_bits()), but it was buggy (it only counted the bottom 32 bits) and it was clumsy (it had a strange and broken fallback for non-GCC-like compilers, which fortunately was never used). Since Mesa already has a _mesa_bitcount() function, it seems better to just create a _mesa_bitcount_64() function rather than special-case this in the i965 driver. This patch creates the new _mesa_bitcount_64() function and rewrites all of the old brw_count_bits() calls to refer to it. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Remove two_side_color from brw_compute_vue_map().Paul Berry2011-09-061-3/+1
| | | | | | | | | | | Since we now lay out the VUE the same way regardless of whether two-sided color is enabled, brw_compute_vue_map() no longer needs to know whether two-sided color is enabled. This allows the two-sided color flag to be removed from the clip, GS, and VS keys, so that fewer GPU programs need to be recompiled when turning two-sided color on and off. Reviewed-by: Eric Anholt <[email protected]>
* i965: GS: Use the VUE map to compute URB size.Paul Berry2011-09-061-12/+11
| | | | | | | | | | | The previous computation had two bugs: (a) it used a formula based on Gen5 for Gen6 and Gen7 as well. (b) it failed to account for the fact that PSIZ is stored in the VUE header. Fortunately, both bugs caused it to compute a URB size that was too large, which was benign. This patch computes the URB size directly from the VUE map, so it gets the result correct in all circumstances. Reviewed-by: Eric Anholt <[email protected]>
* i965: Fix Android build by removing relative includesChad Versace2011-08-301-1/+1
| | | | | | | | | | Replace each occurence of #include "../glsl/*.h" with #include "glsl/*.h" Reviewed-by: Ian Romanick <[email protected]> Signed-off-by: Chad Versace <[email protected]>
* i965: Use state streaming on programs, and state base address on gen5+.Eric Anholt2011-06-181-15/+9
| | | | | | | | | | There will be a little bit of thrashing of the program cache BO as the cache warms up, but once the application is in steady state, this reduces relocations on gen5 and later. On my T420 laptop, cairogl firefox-talos-gfx performance improves 2.6% +/- 1.3% (n=6). No statistically significant performance difference on nexuiz (n=5).
* i965: Don't use the GS for breaking down quads on Ivybridge.Kenneth Graunke2011-05-171-2/+2
| | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Get a ralloc context into brw_compile.Kenneth Graunke2011-05-171-2/+6
| | | | | | | | | | | | This would be so much easier if we were using C++; we could simply use constructors and destructors. Instead, we have to update all the callers. While we're at it, ralloc various brw_wm_compile fields rather than explicitly calloc/free'ing them. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/gs: Move generation check for bailing earlier.Kenneth Graunke2011-05-171-6/+6
| | | | | | | | | On Sandybridge, we don't need to break down primitives. There's no need to bother setting up brw_compile and such if it's not going to be used; bail as early as possible. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Remove dead entrypoints to state cache, rename the one that's left.Eric Anholt2011-04-291-8/+5
| | | | | | | | | | As we expanded the usage of the state cache, it grew extra functionality. However, with the recent state streaming rework, we're back to the state cache being used only for shader kernels, which is the piece of GPU state that's actually expensive to compute again from scratch, since it involves compiling. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Remove hint_gs_always and resulting dead codeIan Romanick2011-04-111-40/+12
| | | | | Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: don't spawn GS thread for LINELOOP on SandybridgeXiang, Haihao2010-12-271-1/+4
| | | | | LINELOOP is converted to LINESTRIP at the beginning of the 3D pipeline. This fixes https://bugs.freedesktop.org/show_bug.cgi?id=32596
* i965: Fix GS state uploading on SandybridgeZhenyu Wang2010-12-061-4/+13
| | | | | | | | Need to check the required primitive type for GS on Sandybridge, and when GS is disabled, the new state has to be issued too, instead of only updating URB state with no GS entry, that caused hang on Sandybridge. This fixes hang issue during conformance suite testing.
* intel: Annotate debug printout checks with unlikely().Eric Anholt2010-11-031-2/+2
| | | | | | | This provides the optimizer with hints about code hotness, which we're quite certain about for debug printouts (or, rather, while we developers often hit the checks for debug printouts, we don't care about performance while doing so).
* i965: Fix GS hang on SandybridgeZhenyu Wang2010-10-141-2/+0
| | | | | | Don't use r0 for FF_SYNC dest reg on Sandybridge, which would smash FFID field in GS payload, that cause later URB write fail. Also not use r0 in any URB write requiring allocate.
* Drop GLcontext typedef and use struct gl_context insteadKristian Høgsberg2010-10-131-1/+1
|
* i965: ignore quads for GS kernel on sandybridgeZhenyu Wang2010-09-281-1/+8
| | | | | Sandybridge's VF would convert quads to polygon which not required for GS then. Current GS state still would cause hang on lineloop.
* intel: Change dri_bo_* to drm_intel_bo* to consistently use new API.Eric Anholt2010-06-081-2/+2
| | | | | The slightly less mechanical change of converting the emit_reloc calls will follow.
* i965: Make rasterization of single and multiple quad prims match.Eric Anholt2010-05-171-0/+6
| | | | | | | | This is trying to follow the spirit of the invariance rules, though they're not specific on this point. Fixes quad-invariance piglit test while retaining the 22s -> 18s win on glean blendFunc. This was a regression in c67d9d84f501f145f841c0b981caff6f4dfd936f.
* i965: Add program dumping for INTEL_DEBUG=gs.Eric Anholt2010-05-141-0/+10
|
* intel: Clean up chipset name and gen num for IronlakeZhenyu Wang2010-04-211-1/+1
| | | | | | | | | Rename old IGDNG to Ironlake, and set 'gen' number for Ironlake as 5, so tracking the features with generation num instead of special is_ironlake flag. Reviewed-by: Eric Anholt <[email protected]> Signed-off-by: Zhenyu Wang <[email protected]>
* i965: Allow for variable-sized auxdata in the state cache.Eric Anholt2010-01-191-6/+7
| | | | | | Everything has been constant-sized until now, but constant buffer handling changes will make us want some additional variable sized array.
* intel: Replace IS_IGDNG checks with intel->is_ironlake or needs_ff_sync.Eric Anholt2009-12-221-2/+2
| | | | Saves ~480 bytes of code.
* i965: fix EXT_provoking_vertex supportRoland Scheidegger2009-11-111-3/+7
| | | | | | | | This didn't work for quad/quadstrips at all, and for all other primitive types it only worked when they were unclipped. Fix up the former in gs stage (could probably do without these changes and instead set QuadsFollowProvokingVertexConvention to false), and the rest in clip stage.
* i965: add support for new chipsetsXiang, Haihao2009-07-131-2/+7
| | | | | | | | | | 1. new PCI ids 2. fix some 3D commands on new chipset 3. fix send instruction on new chipset 4. new VUE vertex header 5. ff_sync message (added by Zou Nan Hai <[email protected]>) 6. the offset in JMPI is in unit of 64bits on new chipset 7. new cube map layout