summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri
Commit message (Collapse)AuthorAgeFilesLines
* i965: Fix typo in shader channel select field name.Kenneth Graunke2012-07-273-20/+20
| | | | | | "chanel" isn't very searchable. I can type, honest! Signed-off-by: Kenneth Graunke <[email protected]>
* i965/msaa: Use MESA_FORMAT_R8 for MCS buffer.Paul Berry2012-07-271-1/+1
| | | | | | | | | | | | No functional change. This patch modifies intel_miptree_alloc_mcs to allocate the 4x MCS buffer using MESA_FORMAT_R8 instead of MESA_FORMAT_A8. In principle it doesn't matter, since we only access the buffer using MCS-specific hardware mechanisms, so all that's important is to use a format with the correct size. However, MESA_FORMAT_A8 has enough unusual behaviours that it seems prudent to avoid it. Acked-by: Kenneth Graunke <[email protected]>
* intel: increase wm thread number to 80 on gen6 GT2Zou Nan hai2012-07-271-5/+1
| | | | | | | | | | | | | It seems reset is not required for setting the max_wm_threads to 80 on gen6 GT2. Increases performance in the Counter-Strike: Source video stress test by 7.18% (n=5). Signed-off-by: Zou Nan hai <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Tested-by: Matt Turner <[email protected]> Acked-by: Eric Anholt <[email protected]>
* i965/msaa: use ROUND_DOWN_TO macro.Paul Berry2012-07-261-6/+6
| | | | | | | | No functional change. This patch modifies brw_blorp_blit.cpp to use the ROUND_DOWN_TO macro instead of open-coded bit manipulations, for clarity. Reviewed-by: Kenneth Graunke <[email protected]>
* radeon: fix Base/base typoBrian Paul2012-07-261-1/+1
| | | | Fixes http://bugs.freedesktop.org/show_bug.cgi?id=52563
* radeon: set swrast_renderbuffer::ColorType field when mapping renderbuffersBrian Paul2012-07-261-0/+2
| | | | | | | | Fixes http://bugs.freedesktop.org/show_bug.cgi?id=47375 NOTE: This is a candidate for the 8.0 branch. Tested-by: Barto <[email protected]>
* i965: Use sendc for all render target writes on Gen6+.Paul Berry2012-07-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | The sendc instruction causes the fragment shader thread to wait for any dependent threads (i.e. threads rendering to overlapping pixels) to complete before sending the message. We need to use sendc on the first render target write in order to guarantee that fragment shader outputs are written to the render target in the correct order. Previously, we only used the "sendc" instruction when writing to binding table index 0. This did the right thing for fragment shaders, because our fragment shader back-ends always issue their first render target write to binding table index 0. However, it did the wrong thing for blorp, which performs its render target writes to binding table index 1. A more robust solution is to use sendc for all render target writes. This should not produce any performance penalty, since after the first sendc, all of the dependent threads will have completed. For more information about sendc, see the Ivy Bridge PRM, Vol4 Part3 p218 (sendc - Conditional Send Message), and p54 (TDR Registers). Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/msaa: Remove TODO comments that are no longer relevant.Paul Berry2012-07-262-3/+2
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* intel: Make more consistent use of _mesa_is_{user,winsys}_fbo()Paul Berry2012-07-269-12/+19
| | | | | | | | | | A lot of code was still differentiating between between winsys and user fbos by testing the fbo's name against zero. This converts everything in the i915 and 965 drivers over to use _mesa_is_user_fbo() and _mesa_is_winsys_fbo(). Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Remove unused param conversion code.Eric Anholt2012-07-257-80/+3
| | | | | | | | Ever since ctx->NativeIntegers was set, the conversion flag has been PARAM_NO_CONVERT. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/msaa: Switch on 8x MSAA for Gen7.Paul Berry2012-07-242-3/+7
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/msaa: Adjust MCS buffer allocation for 8x MSAA.Paul Berry2012-07-241-2/+25
| | | | | | | | MCS buffers use 32 bits per pixel in 8x MSAA, and 8 bits per pixel in 4x MSAA. This patch adjusts the format we use to allocate the buffer so that enough memory is set aside for 8x MSAA. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/msaa: Remove assertion in 3DSTATE_SAMPLE_MASK to allow 8x MSAA.Paul Berry2012-07-241-3/+0
| | | | | | | | The code to emit 3DSTATE_SAMPLE_MASK was already correct for 8x MSAA--this patch just removes an assertion that would have prevented it from being used for 8x MSAA. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/msaa: Adjust 3DSTATE_MULTISAMPLE packet for 8x MSAA.Paul Berry2012-07-241-6/+64
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/blorp: Encode and decode IMS format for 8x MSAA correctly.Paul Berry2012-07-241-39/+107
| | | | | | | | This patch updates the blorp functions encode_msaa() and decode_msaa() to properly handle the encoding of IMS MSAA buffers when num_samples=8. Acked-by: Kenneth Graunke <[email protected]>
* i965/blorp: Compute sample number correctly for 8x MSAA.Paul Berry2012-07-241-13/+42
| | | | | | | | | | When operating in persample dispatch mode, the blorp engine would previously assume that subspan N always represented sample N (this is correct assuming 4x MSAA and a 16-wide dispatch). In order to support 8x MSAA, we must compute which sample is associated with each subspan, using the "Starting Sample Pair Index" field in the thread payload. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/blorp: Properly adjust primitive size for 8x MSAA.Paul Berry2012-07-241-4/+17
| | | | | | | | | | | | | | When rendering to an IMS MSAA surface on Gen7, blorp sets up the rendering pipeline as though it were rendering to a single-sampled surface; accordingly it must adjust the size of the primitive it sends down the pipeline to account for the interleaving of samples in an IMS surface. This patch modifies the size adjustment code to properly handle 8x MSAA, which makes room for the extra samples by using an interleaving pattern that is twice as wide as 4x MSAA. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/blorp: Parameterize manual_blend() by num_samples.Paul Berry2012-07-241-8/+5
| | | | | | | | | | | | This patch adds a num_samples argument to the blorp function manual_blend(), allowing it to be told how many samples need to be blended together. Previously it assumed 4x MSAA, since that was all we supported. We also bump up LOG2_MAX_BLEND_SAMPLES from 2 to 3, so that manual_blend() will be able to handle 8x MSAA. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/msaa: Remove comment about falsely claiming to support MSAA.Paul Berry2012-07-241-5/+0
| | | | | | Gen6+ hardware now supports MSAA properly. Reviewed-by: Chad Versace <[email protected]>
* i965/blorp: Handle DrawBuffers properly.Paul Berry2012-07-241-7/+10
| | | | | | | | | | | | When the client program uses glDrawBuffer() or glDrawBuffers() to select more than one color buffer for drawing into, and then performs a blit, we need to blit into every single enabled draw buffer. +2 oglconforms. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50407 Reviewed-by: Chad Versace <[email protected]>
* i965/blorp: Rearrange order of blit validation and preparation steps.Paul Berry2012-07-241-55/+88
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch rearranges the order of steps performed by a blorp blit from this: - Sync up state of window system buffers. - Find buffers. - Find miptrees. - Make sure buffer formats match. - Handle mirroring. - Make sure width and height match. - Handle clipping/scissoring. - Account for window system origin conventions. - Do depth resolves, if applicable. - Do the blit. - Record the need for a future HiZ resolve, if applicable. To this: - Sync up state of window system buffers. - Handle mirroring. - Make sure width and height match. - Handle clipping/scissoring. - Account for window system origin conventions. - Find buffers. - Make sure buffer formats match. - Find miptrees. - Do depth resolves, if applicable. - Do the blit. - Record the need for a future HiZ resolve, if applicable. The steps are the same, but they are now performed in an order that will make it possible to implement correct DrawBuffers support. Note that the last four steps are now in a separate function (do_blorp_blit), since they will need to be executed repeatedly when DrawBuffers support is added. Reviewed-by: Chad Versace <[email protected]>
* i965/blorp: Don't fall back to swrast when miptrees absent.Paul Berry2012-07-241-6/+2
| | | | | | | | | | | | | Previously, the blorp engine would fall back to swrast if the source or destination of a blit had no associated miptree. This was unnecessary, since _mesa_BlitFramebufferEXT() already takes care of making the blit silently succeed if there are no buffers bound, so the fallback paths could never actually happen in practice. Removing these fallback paths will simplify the implementation of correct DrawBuffers support in blorp. Reviewed-by: Chad Versace <[email protected]>
* i965/blorp: Fixup scissoring of blits to window system buffers.Paul Berry2012-07-241-12/+16
| | | | | | | | | | | | | This patch modifies the order of operations in the blorp engine so that clipping and scissoring are performed before adjusting the coordinates to account for the difference in origin convention between window system buffers and framebuffer objects. Previously, we would do clipping and scissoring after adjusting for origin conventions, so we would get scissoring wrong in window system buffers. Fixes Piglit test "fbo-scissor-blit window". Reviewed-by: Chad Versace <[email protected]>
* i965/blorp: Simplify check that src/dst width/height match.Paul Berry2012-07-241-4/+2
| | | | | | | | | | | When checking that the source and destination dimensions match, we don't need to store the width and height in variables; doing so just risks confusion since right after the check, we do clipping and scissoring, which may alter the width and height. No functional change. Reviewed-by: Chad Versace <[email protected]>
* i965/msaa: Work around problems with null render targets on Gen6.Paul Berry2012-07-242-4/+49
| | | | | | | | | | | | | | | On Gen6, multisampled null render targets don't seem to work properly--they cause the GPU to hang. So, as a workaround, we render into a dummy color buffer. Fortunately this situation (multisampled rendering without a color buffer) is rare, and we don't have to waste too much memory, because we can give the workaround buffer a very small pitch. Fixes piglit test "EXT_framebuffer_multisample/no-color {2,4} depth-computed *" on Gen6. Reviewed-by: Chad Versace <[email protected]>
* i965: Set width, height, and tiling properly for null render targets.Paul Berry2012-07-242-2/+60
| | | | | | | | | | The HW docs say that the width and height of null render targets need to match the width and height of the corresponding depth and/or stencil buffers, and that they need to be marked as Y-tiled. Although leaving these values at 0 doesn't seem to cause any ill effects, it seems wise to follow the documented requirements. Reviewed-by: Chad Versace <[email protected]>
* i965/msaa: Control multisampling behaviour via the visual.Paul Berry2012-07-245-17/+7
| | | | | | | | | | | | | Previously, we used the number of samples in draw buffer 0 to determine whether to set up the 3D pipeline for multisampling. Using the visual is cleaner, and has the benefit of working properly when there is no color buffer. Fixes all piglit tests "EXT_framebuffer_multisample/no-color" on Gen7. On Gen6, the "depth-computed" variants of these tests still fail; this will be addresed in a later patch. Reviewed-by: Chad Versace <[email protected]>
* intel: move error on create context to proper pathJordan Justen2012-07-241-1/+1
| | | | | | | | | The error was being set on the non-error path, rather than the error path. NOTE: This is a candidate for the 8.0 branch. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* nouveau: include glformats.h to get missing prototypeBrian Paul2012-07-241-0/+1
| | | | Fixes http://bugs.freedesktop.org/show_bug.cgi?id=52449
* mesa: move some format helper functions to glformats.cBrian Paul2012-07-241-1/+1
|
* i830: Fix stack corruptionChad Versace2012-07-201-1/+1
| | | | | | | | | | | | | | | Found by compiler warning: i830_texstate.c:131:28: warning: argument to 'sizeof' in 'memset' call is the same expression as the destination; did you mean to dereference it? [-Wsizeof-pointer-memaccess] memset(state, 0, sizeof(state)); ~~~~~ ^~~~~ On 64-bit systems, memset here would write an extra 4 bytes. Note: This is a candidate for the stable branches. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Chad Versace <[email protected]>
* i965/gen7: Increase the WM threads to hardware limits.Eric Anholt2012-07-201-1/+1
| | | | | | | | | | | | This thread count is only supposed to be enabled when "WIZ Hashing Disable in GT_MODE register enabled." I've always been confused whether that means the bit in the register should be 1 or 0. For my IVB GT2's register 0x7008 value of 0x0, this appears to work fine. Improves l4d2 performance at 640x480 by 0.88 +/- 0.11% (n=88). Improves performance with rasterization at 1280x1024 by 1.45% +/- 0.36% (n=6). Reviewed-by: Kenneth Graunke <[email protected]>
* i965/blorp: Use IMS layout when texturing from depth/stencil surfaces.Paul Berry2012-07-201-23/+43
| | | | | | | | | | | | | | Previously, on Gen7, when texturing from a depth or stencil surface, the blorp engine would configure the 3D pipeline as though the input surface was non-multisampled, and perform the necessary coordinate transformations in the fragment shader to account for the IMS layout. This meant outputting a lot of extra fragment shader code, and it raised some uncertainty about how to deal with very large surfaces. This patch modifies blorp to configure the 3D pipeline properly for IMS layout when reading from depth and stencil surfaces. Reviewed-by: Anuj Phogat <[email protected]>
* i965/blorp: Loosen assertions in compute_msaa_layout_for_pipeline.Paul Berry2012-07-201-7/+2
| | | | | | | | Previously, on Gen7, compute_msaa_layout_for_pipeline() would verify that IMS layout is not used. However, now that we configure SURFACE_STATE correctly for IMS surfaces, IMS layout is available. Reviewed-by: Anuj Phogat <[email protected]>
* i965/blorp: Configure SURFACE_STATE correctly for IMS surfaces.Paul Berry2012-07-203-6/+14
| | | | | | | | | | | | | This patch modifies gen7_set_surface_num_multisamples() to set up the SURFACE_STATE appropriately for texturing from IMS format MSAA surfaces (which are only used on Gen7 for depth and stencil buffers). Since the function now sets more than just the number of multisamples, it's been renamed to gen7_set_surface_msaa(). This will make it possible to remove some kludginess from the blorp engine. Reviewed-by: Anuj Phogat <[email protected]>
* i965/blorp: Optimize manual_blend() for compressed multisampled surfaces.Paul Berry2012-07-201-0/+23
| | | | | | | | | | When downsampling a compressed multisampled surface, we can take a shortcut to downsample any pixels that were completely covered by a single primitive. In this case, the first color value we fetch is the correct final color for the downsampled pixel, so we can skip the rest of the blending operation. Reviewed-by: Anuj Phogat <[email protected]>
* i965/blorp: Fix integer downsampling on Gen7.Paul Berry2012-07-202-11/+55
| | | | | | | | | | | | | | | | When downsampling an integer-format buffer on Gen7, we need to use the "avg" instruction rather than the "add" instruction, to ensure that we don't overflow the range of 32-bit integers. Also, we need to use the proper register type (BRW_REGISTER_TYPE_D or BRW_REGISTER_TYPE_UD) for intermediate color data and for writing to the render target. Note: this patch causes blorp to use the proper register type for all operations (downsampling, upsampling, and ordinary blits). Strictly speaking, this is only necessary for downsampling, because the other operations exclusively use MOV instructions on the color data. But it's simpler to use the proper register type in all cases. Reviewed-by: Anuj Phogat <[email protected]>
* i965/blorp: Modify manual_blend() to avoid unnecessary loss of precision.Paul Berry2012-07-201-27/+90
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When downsampling from an MSAA image to a single-sampled image, it is inevitable that some loss of numerical precision will occur, since we have to use 32-bit floating point registers to hold the intermediate results while blending. However, it seems reasonable to expect that when all samples corresponding to a given pixel have the exact same color value, there will be no loss of precision. Previously, we averaged samples as follows: blend = (((sample[0] + sample[1]) + sample[2]) + sample[3]) / 4 This had the potential to lose numerical precision when all samples have the same color value, since ((sample[0] + sample[1]) + sample[2]) may not be precisely representable as a 32-bit float, even if the individual samples are. This patch changes the formula to: blend = ((sample[0] + sample[1]) + (sample[2] + sample[3])) / 4 This avoids any loss of precision in the event that all samples are the same, by ensuring that each addition operation adds two equal values. As a side benefit, this puts the formula in the form we will need in order to implement correct blending of integer formats. Reviewed-by: Anuj Phogat <[email protected]>
* i965: Add support for AVG instruction.Paul Berry2012-07-202-0/+23
| | | | | | | | | | | | | From the Ivy Bridge PRM, Vol4 Part3 p152: "The avg instruction performs component-wise integer average of src0 and src1 and stores the results in dst. An integer average uses integer upward rounding. It is equivalent to increment one to the addition of src0 and src1 and then apply an arithmetic right shift to this intermediate value." Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Replace fs_visitor::kill_emitted with gl_fragment_program::UsesKill.Paul Berry2012-07-202-4/+1
| | | | | | | The kill_emitted variable was duplicating the functionality of gl_fragment_program::UsesKill. There's no need for both. Reviewed-by: Eric Anholt <[email protected]>
* mesa: Set gl_fragment_program::UsesKill in do_set_program_inouts.Paul Berry2012-07-201-25/+0
| | | | | | | | | | | | | | | | | | | | | Previously, the code for setting this flag for GLSL programs was duplicated in three places: brw_link_shader(), glsl_to_tgsi_visitor, and ir_to_mesa_visitor. In addition to the unnecessary duplication, there was a performance problem on i965: brw_link_shader() set the flag before doing its final round of optimizations, which meant that if the optimizations managed to eliminate all the discard operations, the flag would still be set, resulting (at least in theory) in slower performance. This patch consolidates all of the code that sets UsesKill for GLSL programs into do_set_program_inouts(), which already is doing a similar job for UsesDFdy, and which occurs after i965's final round of optimizations. Non-GLSL programs (ARB programs and the state tracker's glBitmap program) are unaffected. Reviewed-by: Eric Anholt <[email protected]>
* i965: Avoid unnecessary recompiles for shaders that don't use dFdy().Paul Berry2012-07-194-14/+10
| | | | | | | | | | | | The i965 back-end needs to compile dFdy() differently for FBOs and window system framebuffers, because Y coordinates are flipped between the two (see commit 82d2596: i965: Compute dFdy() correctly for FBOs). This patch avoids unnecessarily recompiling shaders that don't use dFdy(), by only setting render_to_fbo in the wm program key if the shader actually uses dFdy(). Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* drirc: Add disable_blend_func_extended workaround for Unigine OilRush.Kenneth Graunke2012-07-191-0/+6
| | | | | | | | | | The previous commit implemented the workaround, cited a bug report about OilRush, but actually only enabled the workaround for the demos. Turn it on for OilRush too. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50291 Signed-off-by: Kenneth Graunke <[email protected]>
* i965: Add a driconf option to disable GL_ARB_blend_func_extended.Kenneth Graunke2012-07-194-2/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unigine Heaven (at least) has a bug where it incorrectly uses the GL_ARB_blend_func_extended extension. Dual source blending allows two color outputs per render target; individual shader outputs can be assigned to be either the first or second blending input by setting the 'index' via one of two methods: - An API call: glBindFragDataLocationIndexed() - The GLSL 'layout' qualifier provided by GL_ARB_explicit_attrib_location Both of these only work on user defined fragment shader outputs; it's an error to use either on built-in outputs like gl_FragData. Unigine uses gl_FragData and gl_FragColor exclusively, and doesn't even attempt to use either method to set index == 1. However, it does set the blending function to SRC1 enums, which requires a fragment shader output with index == 1 or else rendering is undefined. In other words, enabling ARB_blend_func_extended causes Unigine to render incorrectly, resulting in an apparent regression, even though our driver code (as far as I can tell) is perfectly fine. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50291 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Make register spill/unspill only do the regs for that instruction.Eric Anholt2012-07-181-33/+33
| | | | | | | | | | | | | | | | | | Previously, if we were spilling the result of a texture call, we would store all 4 regs, then for each use of one of those regs as the source of an instruction, we would unspill all 4 regs even though only one was needed. In both lightsmark and l4d2 with my current graphics config, the shaders that produce spilling do so on split GRFs, so this doesn't help them out. However, in a capture of the l4d2 shaders with a different snapshot and playing the game instead of using a demo, it reduced one shader from 2817 instructions to 2179, due to choosing a now-cheaper texture result to spill instead of piles of texcoords. v2: Fix comment noted by Ken, and fix the if condition associated with it for the current state of what constitutes a partial write of the destination. Reviewed-by: Kenneth Graunke <[email protected]> (v1)
* i965/fs.h: Refactor tests for instructions modifying a register.Eric Anholt2012-07-184-34/+16
| | | | | | | | | | There's one instance of a potential behavior change: propagate_constants may now propagate into a part of a vgrf after a different part of it was overwritten by a send that returns multiple registers. I don't think we ever generate IR that meets that condition, but it's something to note if we bisect behavior change to this. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Replace usage is_tex() with regs_written() checks.Eric Anholt2012-07-181-9/+9
| | | | | | | | | | In these places, we care about any sort of send that hits more than one reg, not just textures. We don't yet have anything else returning more than one reg, so there's no change. v2: Use mlen instead of is_tex() for the is-it-a-send check. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Rename virtual_grf_next to virtual_grf_count.Eric Anholt2012-07-186-22/+21
| | | | | | | "count" is a more useful name, since most of the time we're using it for looping over the variables. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Move a block out of a loop in live variables setup.Eric Anholt2012-07-181-4/+5
| | | | | | This was accidentally copy-and-pasted inside. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/msaa: Disable alpha-to-{coverage, one} when drawbuffer zero is in ↵Anuj Phogat2012-07-181-7/+21
| | | | | | | | | | | | | | | | | | | integer format OpenGL specification 3.3 (page 196), section 4.1.3 says: If drawbuffer zero is not NONE and the buffer it references has an integer format, the SAMPLE_ALPHA_TO_COVERAGE and SAMPLE_ALPHA_TO_ONE operations are skipped." This should work properly even if there are other draw buffers that are not in integer format. This patch makes following piglit tests pass on mesa: int-draw-buffers-alpha-to-coverage int-draw-buffers-alpha-to-one Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Signed-off-by: Anuj Phogat <[email protected]>