mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	i965: Implement CopyTexSubImage2D via BLORP (and use it by default).	Kenneth Graunke	2013-02-06	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The BLT engine has many limitations. Currently, it can only blit X-tiled buffers (since we don't have a kernel API to whack the BLT tiling mode register), which means all depth/stencil operations get punted to meta code, which can be very CPU-intensive. Even if we used the BLT engine, it can't blit between buffers with different tiling modes, such as an X-tiled non-MSAA ARGB8888 texture and a Y-tiled CMS ARGB8888 renderbuffer. This is a fundamental limitation, and the only way around that is to use BLORP. Previously, BLORP only handled BlitFramebuffer. This patch adds an additional frontend for doing CopyTexSubImage. It also makes it the default. This is partly to increase testing and avoid hiding bugs, and partly because the BLORP path can already handle more cases. With trivial extensions, it should be able to handle everything the BLT can. This helps PlaneShift massively, which tries to CopyTexSubImage2D between depth buffers whenever a player casts a spell. Since these are Y-tiled, we hit meta and software ReadPixels paths, eating 99% CPU while delivering ~1 FPS. This is particularly bad in an MMO setting because people cast spells all the time. It also helps Xonotic in 4X MSAA mode. At default power management settings, I measured a 6.35138% +/- 0.672548% performance boost (n=5). (This data is from v1 of the patch.) No Piglit regressions on Ivybridge (v3) or Sandybridge (v2). v2: Create a fake intel_renderbuffer to wrap the destination texture image and then reuse do_blorp_blit rather than reimplementing most of it. Remove unnecessary clipping code and conditional rendering check. v3: Reuse formats_match() to centralize checks; delete temporary renderbuffers. Reorganize the code. v4: Actually copy stencil when dealing with separate stencil buffers but packed depth/stencil formats. Tested by a new Piglit test. NOTE: This is a candidate for the 9.1 branch. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Paul Berry <[email protected]> [v4] Reviewed-by: Ian Romanick <[email protected]> [v3] Reviewed-and-tested-by: Carl Worth <[email protected]> [v2] Tested-by: Martin Steigerwald <[email protected]> [v3]
*	i965: Remove dead field brw_wm_prog_data::error.	Kenneth Graunke	2013-02-03	1	-1/+0
\|
*	i965: Remove dead field brw_context::constant_map.	Kenneth Graunke	2013-02-03	1	-1/+0
\| \| \| \|	This was used by the old VS backend, but that's long gone.
*	i965: Use the glarray _ElementSize that Mesa tracks for us.	Eric Anholt	2013-01-25	1	-2/+0
\| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Move program_id to intel_screen instead of brw_context.	Kenneth Graunke	2013-01-12	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	According to bug #54524, I regressed oglconform's multicontext test when I reenabled the fragment shader precompile. However, these test cases only passed by miraculous coincedence. We assign each fragment program a unique ID (brw_fragment_program::id which becomes brw_wm_prog_key::program_string_id) which we obtain by storing a per-context counter. The test case uses GLX context sharing to access the same fragment program from two different contexts. This means that we share a program cache. Before the precompile, if both contexts happened to use the same shaders in the same order, we'd obtain the same program_string_ids (by virtue of doing the same computation twice). However, the more likely scenario is that they completely disagree on program_string_id. This meant that we'd have two completely different fragment shaders in the cache with the same ID, tricking us to think they were the same (aside from NOS), so we'd render using the wrong program. This patch implements a simple fix suggested by Eric: it moves the global counter out of brw_context and into intel_screen, which is shared across all contexts. A mutex protects it from concurrent access. This is also the first direct usage of pthreads in the i965 driver. Fixes 10 subcases of oglconform's multicontext test. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54524 Reviewed-by: Eric Anholt <[email protected]>
*	mesa: Add ALIGN() macro to main/macros.h.	Paul Berry	2013-01-08	1	-0/+1
\| \| \| \| \| \| \| \| \|	Previously this macro existed in 3 separate places, some inside the intel driver and some outside of it. It makes more sense to have it in main/macros.h Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Scale shader_time to compensate for resets.	Eric Anholt	2012-12-14	1	-0/+6
\| \| \| \| \| \| \| \| \| \|	Some shaders experience resets more than others, which skews the numbers reported. Attempt to correct for this by linearly scaling according to the number of resets that happen. Note that will not be accurate if invocations of shaders have varying times and longer invocations are more likely to reset. However, this should at least be better than the previous situation.
*	i965: Add a debug flag for counting cycles spent in each compiled shader.	Eric Anholt	2012-12-05	1	-4/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This can be used for two purposes: Using hand-coded shaders to determine per-instruction timings, or figuring out which shader to optimize in a whole application. Note that this doesn't cover the instructions that set up the message to the URB/FB write -- we'd need to convert the MRF usage in these instructions to GRFs so that our offsets/times don't overwrite our shader outputs. Reviewed-by: Kenneth Graunke <[email protected]> (v1) v2: Check the timestamp reset flag in the VS, which is apparently getting set fairly regularly in the range we watch, resulting in negative numbers getting added to our 32-bit counter, and thus large values added to our uint64_t. v3: Rebase on reladdr changes, removing a new safety check that proved impossible to satisfy. Add a comment to the AOP defs from Ken's review, and put them in a slightly more sensible spot. v4: Check timestamp reset in the FS as well.
*	i965: Move all the depth/stencil/hiz offset logic into the workaround.	Eric Anholt	2012-11-19	1	-0/+15
\| \| \| \| \| \| \| \| \| \|	Given that we have the mask information here (assuming the rebase is to the same tiling, which is safe), we can just save a set of miptrees and offsets and the global intra-tile offset in the context and cut out a bunch of logic. This will also save emitting the next fix I need to do twice. Acked-by: Chad Versace <[email protected]>
*	i965: Remove duplicate brw_opcodes table in favor of opcode_descs.	Kenneth Graunke	2012-11-15	1	-7/+0
\| \| \| \| \| \| \| \|	brw_optimize.c's brw_opcodes table was a copy of brw_disasm.c's opcode_descs table, but with an additional field: is_arith. Now that I've deleted that, the two are identical. Keep the one in brw_disasm.c. Reviewed-by: Eric Anholt <[email protected]>
*	i965: Remove brw_instruction_info::is_arith().	Kenneth Graunke	2012-11-15	1	-1/+0
\| \| \| \| \| \|	Nobody uses it. Reviewed-by: Eric Anholt <[email protected]>
*	i965: Remove unused BRW_PACKCOLOR8888 macro.	Kenneth Graunke	2012-11-15	1	-4/+0
\| \| \| \|	Reviewed-by: Eric Anholt <[email protected]>
*	i965: Remove brw_shader_program wrapper struct.	Kenneth Graunke	2012-11-15	1	-4/+0
\| \| \| \| \| \| \|	At this point, it's just gl_shader_program. Nobody even uses it; even the program that creates them only returns gl_shader_program pointers. Reviewed-by: Eric Anholt <[email protected]>
*	i965: Remove unused struct brw_vs_ouput_sizes.	Kenneth Graunke	2012-11-15	1	-8/+0
\| \| \| \| \| \|	With a name like that, it can't be used. Sure enough, it's not. Reviewed-by: Eric Anholt <[email protected]>
*	i965: Fix slow leak of brw->wm.compile_data->store	Eric Anholt	2012-11-08	1	-1/+0
\| \| \| \| \| \| \| \|	We were successfully freeing our compile data at context destroy, but until then we were allocating a new store every compile without freeing it. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56019 Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Fix uploading user vertex arrays with basevertex set.	Eric Anholt	2012-11-04	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	If the index buffer is full of values like "0 1 2 3", but basevertex is 4, we need to upload at least vertex data for elements 4 5 6 7. Whether we also upload 0 1 2 3 is a question of whether there are VBOs present or not -- see the code setting start_vertex_bias in brw_draw_upload.c. Fixes piglit draw-elements*base-vertex user_varrays Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/vs: Remove support for the old parameter layout.	Kenneth Graunke	2012-11-01	1	-1/+0
\| \| \| \| \| \|	Only the old backend used it. Reviewed-by: Eric Anholt <[email protected]>
*	i965/vs: Delete the old vertex shader backend.	Kenneth Graunke	2012-11-01	1	-1/+0
\| \| \| \| \| \|	It's no longer used for anything. Reviewed-by: Eric Anholt <[email protected]>
*	i965: Don't bother trying to extend the current vertex buffers.	Kenneth Graunke	2012-10-31	1	-7/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This essentially reverts the following: commit c625aa19cb53ed27f91bfd16fea6ea727e9a5bbd Author: Chris Wilson <[email protected]> Date: Fri Feb 18 10:37:43 2011 +0000 intel: extend current vertex buffers While working on optimizing an upcoming Steam title, I broke this code. Eric expressed his doubts about this optimization, and noted that the original commit offered no performance data. I ran before and after benchmarks on Xonotic and Citybench, and found that this code made no difference. So, remove it to reduce complexity and make future work simpler. Reviewed-by: Eric Anholt <[email protected]>
*	i965: Merge brw_prepare_query_begin() and brw_emit_query_begin().	Eric Anholt	2012-10-26	1	-1/+0
\| \| \| \| \| \| \|	This is a leftover from when we had to split those two functions due to the separate BO validation step. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Rename misleading "active" field of brw->query.	Eric Anholt	2012-10-26	1	-1/+1
\| \| \| \| \| \| \| \| \|	"Active" is an already-used term for the query being between glBeginQuery() and glEndQuery(), while this is tracking whether the start of the packet pair for emitting state has been inserted into the current batchbuffer. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Statically allocate the reg_sets at context initialization.	Eric Anholt	2012-10-17	1	-22/+24
\| \| \| \| \| \| \|	Now that we've replaced all the variable settings other than reg_width, it's easy to hang on to this (the expensive part of setting up the allocator). Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/vs: Add a little bit of IR-level debug ability.	Eric Anholt	2012-10-17	1	-0/+8
\| \| \| \| \| \| \|	This is super basic, but it let me visualize a problem I had with opt_compute_to_mrf(). Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Fix rendering to small mipmaps of depth/stencil buffers using a temp mt.	Eric Anholt	2012-10-16	1	-0/+1
\| \| \| \| \| \| \|	Fixes 51 piglit tests (fbo-clear-formats, and most of the remaining failures in depthstencil). Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Share the draw x/y offset masking code between main/blorp and all gens.	Eric Anholt	2012-10-16	1	-0/+8
\| \| \| \| \| \| \| \| \| \|	This code is twisty, and the comment before most of the blocks was actually giving me the opposite impression from its intention: We want to apply as much of our offset as possible through coarse tile-aligned adjustment, since we can do so independently per buffer, and apply the minimum we can through fine-grained drawing offset x/y, since it has to agree between all buffers. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Make the param pointer arrays for the VS dynamically sized.	Eric Anholt	2012-09-07	1	-2/+2
\| \| \| \| \| \| \| \| \|	Saves 96MB of wasted memory in the l4d2 demo. v2: Rebase on compare func change, change brace style. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	i965: Make the param pointer arrays for the WM dynamically sized.	Eric Anholt	2012-09-07	1	-2/+5
\| \| \| \| \| \| \| \| \|	Saves 26.5MB of wasted memory allocation in the l4d2 demo. v2: Rebase on compare func change, fix comments. Reviewed-by: Ian Romanick <[email protected]> (v1) Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Add functions for comparing two brw_wm/vs_prog_data structs.	Eric Anholt	2012-09-07	1	-5/+24
\| \| \| \| \| \| \| \| \| \| \| \|	Currently, this just avoids comparing all unused parts of param[] and pull_param[], but it's a step toward getting rid of those giant statically sized arrays. v2: Actually use the new function instead of just looking at its address. This required changing the args to const pointers. (review by Kenneth) Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Set context flags	Ian Romanick	2012-08-29	1	-0/+1
\| \| \| \|	Signed-off-by: Ian Romanick <[email protected]>
*	i965: Validate API and version in brwCreateContext	Ian Romanick	2012-08-13	1	-0/+3
\| \| \| \| \| \| \|	v2: Use base-10 for versions like gl_context::Version. Suggested by Ken. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Add performance debug for shader recompiles.	Eric Anholt	2012-08-12	1	-0/+2
\| \| \| \| \|	Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Bind UBOs as surfaces like we do for pull constants.	Eric Anholt	2012-08-07	1	-2/+20
\| \| \| \| \| \|	v2: Comment fix, drop extraneous parens (review by Kenneth) Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Remove unused param conversion code.	Eric Anholt	2012-07-25	1	-41/+0
\| \| \| \| \| \| \| \|	Ever since ctx->NativeIntegers was set, the conversion flag has been PARAM_NO_CONVERT. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/msaa: Work around problems with null render targets on Gen6.	Paul Berry	2012-07-24	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On Gen6, multisampled null render targets don't seem to work properly--they cause the GPU to hang. So, as a workaround, we render into a dummy color buffer. Fortunately this situation (multisampled rendering without a color buffer) is rare, and we don't have to waste too much memory, because we can give the workaround buffer a very small pitch. Fixes piglit test "EXT_framebuffer_multisample/no-color {2,4} depth-computed *" on Gen6. Reviewed-by: Chad Versace <[email protected]>
*	i965: Add a lowering pass to convert TXD to TXL by computing the LOD.	Kenneth Graunke	2012-07-12	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Intel hardware doesn't natively support textureGrad with shadow comparisons. So we need to generate code to handle it somehow. Based on the equations of page 205 of the OpenGL 3.0 specification, it's possible to compute the LOD value that would be selected given the gradient values. Then, we can simply convert the TXD to a TXL. Currently, this passes 34/46 of oglconform's shadow-grad subtests; four cubemap tests are regressed. We should investigate this in the future. v2: Apply abs() to the scalar case (thanks to Eric). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965/msaa: Fix centroid interpolation of unlit pixels.	Paul Berry	2012-07-02	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	From the Ivy Bridge PRM, Vol 2 Part 1 p280-281 (3DSTATE_WM: Barycentric Interpolation Mode): "Errata: When Centroid Barycentric mode is required, HW may produce incorrect interpolation results when a 2X2 pixels have unlit pixels." To work around this problem, after doing centroid interpolation, we replace the centroid-interpolated values for unlit pixels with non-centroid-interpolated values (which are interpolated at pixel centers). This produces correct rendering at the expense of a slight increase in shader execution time. I've conditioned the workaround with a runtime flag (brw->needs_unlit_centroid_workaround) in the hopes that we won't need it in future chip generations. Fixes piglit tests "EXT_framebuffer_multisample/interpolation {2,4} {centroid-deriv,centroid-deriv-disabled}". All MSAA interpolation tests pass now. Reviewed-by: Eric Anholt <[email protected]>
*	i965: fix transform feedback with primitive restart	Jordan Justen	2012-07-02	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When querying GL_PRIMITIVES_GENERATED, if primitive restart is also used, then take the software primitive restart path so GL_PRIMITIVES_GENERATED is returned correctly. GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN is also updated since it will also affected by the same issue. As noted in brw_primitive_restart.c, with further work we should be able to move this situation back to a hardware handled path. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: enable ARB_instanced_arrays extension	Jordan Justen	2012-06-27	1	-0/+4
\| \| \| \| \| \| \| \|	Set the step_rate value when drawing to implement ARB_instanced_arrays for gen >= 4. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965/msaa: Implement glSampleCoverage.	Paul Berry	2012-06-26	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch enables glSampleCoverage() functionality, which allows the client program to specify that only a portion of the samples be lit up when performing multisampled rendering. i965 supports glSampleCoverage() through the 3DSTATE_SAMPLE_MASK command packet, which allows the driver to specify a bitfield indicating which samples to light up. Fixes piglit tests "EXT_framebuffer_multisample/sample-coverage {2,4} {inverted,non-inverted}". Reviewed-by: Anuj Phogat <[email protected]>
*	i965/blorp: Allocate space for push constants on Gen7.	Paul Berry	2012-05-25	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On Gen7, push constants for shader programs are stored in the URB, so blorp code needs to set aside space for them. This was previously unnecessary because blorp code was based on HiZ operations, which don't require any shaders. This patch adds a call from gen7_blorp_exec() to gen7_allocate_push_constants(), to ensure that push constants are assigned the correct location in the URB. It also extracts a new function gen7_emit_urb_state() from gen7_upload_urb(), which is re-used by gen7_blorp_emit_urb_config() to ensure that the URB regions used by all the pipeline stages leave room for the push constants. Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: add flag to enable cut_index	Jordan Justen	2012-05-23	1	-0/+1
\| \| \| \| \| \| \| \| \|	When brw->prim_restart.enable_cut_index is set, the cut index will be enabled when uploading index_buffer commands. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	i965: create code path to handle primitive restart in hardware	Jordan Justen	2012-05-23	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	For newer hardware we disable the VBO module's software handling of primitive restart. We now handle primitive restarts in brw_handle_primitive_restart. The initial version of brw_handle_primitive_restart simply calls vbo_sw_primitive_restart, and therefore still uses the VBO module software primitive restart support. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	i965/gen6+: Add support for GL_ARB_blend_func_extended.	Eric Anholt	2012-05-23	1	-0/+1
\| \| \| \| \| \| \|	v2: Add support for gen6, and don't turn it on if blending is disabled. (fixes GPU hang), and note it in docs/GL3.txt Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Completely annotate the batch bo when aub dumping.	Paul Berry	2012-05-22	1	-24/+63
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, when the environment variable INTEL_DEBUG=aub was set, mesa would simply instruct DRM to start dumping data to an .aub file, but we would not provide DRM with any information about the format of the data in various buffers. As a result, a lot of the data in the generate .aub file would be unannotated, making further data analysis difficult. This patch causes the entire contents of each batch buffer to be annotated using the data in brw->state_batch_list (which was previously used only to annotate the output of INTEL_DEBUG=bat). This includes data that was allocated by brw_state_batch, such as binding tables, surface and sampler states, depth/stencil state, and so on. The new annotation mechanism requires DRM version 2.4.34. Reviewed-by: Eric Anholt <[email protected]>
*	i965/gen6: Initial implementation of MSAA.	Paul Berry	2012-05-15	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch enables MSAA for Gen6, by modifying intel_mipmap_tree to understand multisampled buffers, adapting the rendering pipeline setup to enable multisampled rendering, and adding multisample resolve operations to brw_blorp_blit.cpp. Some preparation work is also included for Gen7, but it is not yet enabled. MSAA support is still fairly preliminary. In particular, the following are not yet supported: - Fully general blits between MSAA and non-MSAA buffers. - Formats other than RGBA8, DEPTH24, and STENCIL8. - Centroid interpolation. - Coverage parameters (glSampleCoverage, GL_SAMPLE_ALPHA_TO_COVERAGE, GL_SAMPLE_ALPHA_TO_ONE, GL_SAMPLE_COVERAGE, GL_SAMPLE_COVERAGE_VALUE, GL_SAMPLE_COVERAGE_INVERT). Fixes piglit tests "EXT_framebuffer_multisample/accuracy" on i965/Gen6. v2: - In intel_alloc_renderbuffer_storage(), quantize the requested number of samples to the next higher sample count supported by the hardware. This ensures that a query of GL_SAMPLES will return the correct value. It also ensures that MSAA is fully disabled on Gen7 for now (since Gen7 MSAA support doesn't work yet). - When reading from a non-MSAA surface, ensure that s_is_zero is true so that we won't try to read from a nonexistent sample.
*	i965/gen6+: Add code to perform blits on the render path ("blorp").	Paul Berry	2012-05-15	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch expands the "blorp" component to be able to perform blits as well as HiZ resolves. The new blitting code is located in brw_blorp_blit.cpp. This includes the necessary fragment shader code to look up pixels in the source buffer (which is configured as a texture) and output them to the destination buffer (which is configured as the render target). Most of the time the fragment shader code is simple and straightforward, since it merely has to apply a coordinate offset, read from the texture, and write to the render target. However, in the case of blitting stencil buffers, things are more complicated, since the GPU stores stencil data using W tiling, and W tiling is not supported for textures or render targets. So, we set up the stencil buffers as Y tiled, and emit fragment shader code that adjusts the coordinates to account for the difference between W and Y tiling. Furthermore, since a rectangular region in W tiling does not necessarily correspond to a rectangular region in Y tiling, we widen the rectangle primitive to the nearest tile boundary and have the fragment shader "kill" any pixels that don't fall inside the actual desired destination rectangle. All of this is a necessary prerequisite for implementing MSAA, since we'll need to be able to blit between multisample color, depth, and stencil buffers and their non-multisampled counterparts, and none of the existing blitting mechanisms support multisampling. In addition, the new blitting code should speed up operations where we previously fell back to software rasterization, such as blitting of stencil buffers. The current fallback sequence is: first we try to do a blit using the hardware blitting engine. If that fails we try to do a blit using the render path. If that also fails then we do the blit using a meta-op (which may or may not fall back to software rasterization). Note that blitting using the render path has some limitations at the moment: it only supports a few formats, and it doesn't support clipping or scissoring. These limitations will be addressed in future patch series. v2: - Add the code that configures the WM program to gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than creating separate ...enable() functions. - Call intel_prepare_render before determining which miptrees we are blitting from/to, because it may cause miptrees to be reallocated. - Allow the blit to mirror X and/or Y coordinates. - Disable blorp blits on Gen7 for now, since they aren't working yet.
*	intel: Add extern "C" declarations to headers	Paul Berry	2012-05-10	1	-0/+7
\| \| \| \| \| \| \| \|	These declarations are necessary to allow C++ code to call C code without causing unresolved symbols (which would make the driver fail to load). Reviewed-by: Chad Versace <[email protected]>
*	i965: Rename BRW_MAX_SURFACES to BRW_MAX_WM_SURFACES.	Kenneth Graunke	2012-04-18	1	-2/+2
\| \| \| \| \| \| \| \|	Now that we use separate binding tables for WM, VS, and GS, and have BRW_MAX_VS_SURFACES and BRW_MAX_GS_SURFACES macros, we really shouldn't have an unqualified BRW_MAX_SURFACES macro. It's confusing. Signed-off-by: Kenneth Graunke <[email protected]>
*	i965: Fix outdated comments about binding tables.	Kenneth Graunke	2012-04-18	1	-12/+8
\| \| \| \| \| \| \| \| \|	They had a number of issues: - A paragraph states that we use a single binding table, but we don't. - We labelled the WM binding table diagram as SOL/WM. - The WM diagram had an "Only relevant to the WM" comment. Duh. Signed-off-by: Kenneth Graunke <[email protected]>
*	i965: handle gl_PointCoord for Gen4 and Gen5 platforms	Yuanhan Liu	2012-03-07	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch add the support of gl_PointCoord gl builtin variable for platform gen4 and gen5(ILK). Unlike gen6+, we don't have a hardware support of gl_PointCoord, means hardware will not calculate the interpolation coefficient for you. Instead, you should handle it yourself in sf shader stage. But badly, gl_PointCoord is a FS instead of VS builtin variable, thus it's not included in c.vue_map generated in VS stage. Thus the current code doesn't aware of this attribute. And to handle it correctly, we need add it to c.vue_map manually to let SF shader generate the needed interpolation coefficient for FS shader. SF stage has it's own copy of vue_map, thus I think it's safe to do it manually. Since handling gl_PointCoord for gen4 and gen5 platforms is somehow a little special, I added a lot of comments and hope I didn't overdo it ;) v2: add a /* _NEW_BUFFERS */ comment to note the state flag dependency and also add the _NEW_BUFFERS dirty mask (Eric). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45975 Piglit: glsl-fs-pointcoord and fbo-gl_pointcoord NOTE: This is a candidate for stable release branches. Signed-off-by: Yuanhan Liu <[email protected]> Reviewed-by: Eric Anholt <[email protected]>