aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers
Commit message (Collapse)AuthorAgeFilesLines
* i965: Make sure we always compute valid index bounds before drawing.Iago Toral Quiroga2014-03-281-1/+2
| | | | | | | | | When doing software rendering (i.e. rendering to the selection buffer) we need to make sure that we have valid index bounds before calling _tnl_draw_prims(), otherwise we can crash. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=59455 Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Use intel_upload_space() for pull constant uploads.Eric Anholt2014-03-264-33/+17
| | | | | | | | | | | | | This also happens to fix a leak of the current GS pull constant BO on context destroy, by just not holding on to the pull const bos after the surface state is generated. No statistically significant performance difference on GLB2.7 on HSW at 1024x768 (n=40) or 320x240 (n=44), or on BYT at 320x240 (n=47). v2: Rebase on intel_upload simplification. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Massively simplify the intel_upload implementation.Eric Anholt2014-03-264-126/+77
| | | | | | | | | | | | | | | | | | | The implementation kept a page-sized area for uploading data, and uploaded chunks from that to a 64kb-sized streamed buffer. This wasted cache footprint (and extra state tracking to do so) when we want to just write our data into the buffer immediately. Instead, build it around an interface like brw_state_batch() that just gets you a pointer to BO memory to upload your stuff immediately. Improves OpenArena on HSW by 1.62209% +/- 0.355299% (n=61) and on BYT by 1.7916% +/- 0.415743% (n=31). v2: Rebase on Mesa master, drop old prototypes. Re-do performance comparison on a kernel that doesn't punish CPU efficiency improvements. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: For fast color clears, only check the color of live channels.Kevin Rogovin2014-03-251-1/+2
| | | | | | | | When deciding if a clear color is suitable for fast clear, take into account if a color channel is active in the buffer format. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Set Broadwell MOCS values everywhere it's possible.Kenneth Graunke2014-03-256-12/+27
| | | | | | | | | | | | | | This patch introduces two pre-canned MOCS values: BDW_MOCS_WB (write-back, all caches) and BDW_MOCS_WT (write-through, all caches). We use write-through caching for render targets, and write-back for all other data. (At least on Haswell, I believe write-back LLC/eLLC didn't work for scan-out buffers, while write-through did.) No performance analysis has been done on the impact of this patch. Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Eric Anholt <[email protected]>
* i965: fix dma_buf import with non-zero offset.Gwenole Beauchesne2014-03-251-0/+9
| | | | | | | | | | | Fix eglCreateImage() from a packed dma_buf surface with a non-zero offset to pixels data. In particular, this fixes support for planar YUV surfaces when they are individually mapped on a per-plane basis, i.e. when the OES_EGL_image_external is not used and user application wants to use its own shader code for composition, or processing on individual plane (OCL). Signed-off-by: Gwenole Beauchesne <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* mesa/sso: Add gl_pipeline_object parameter to _mesa_use_shader_programGregory Hainaut2014-03-251-3/+6
| | | | | | | | | | | | | Extend use_shader_program to support a different target. Allow to reuse the function to update the pipeline state. Note I bypass the flush when target isn't current. Maybe it would be better to create a new UseProgramStages driver function This was originally included in another patch, but it was split out by Ian Romanick. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* meta/sso: Update meta to save and restore SSO state.Gregory Hainaut2014-03-252-0/+20
| | | | | | | | | | | | | | | save and restore _Shader/Pipeline binding point. Rational we don't want any conflict when the program will be unattached. V2: formatting improvement V3 (idr): * Build fix. The original patch added calls to _mesa_use_shader_program with 4 parameters, but the fourth parameter isn't added to that function until a much later patch. Just drop that parameter for now. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* mesa/sso: rename Shader to the pointer _ShaderGregory Hainaut2014-03-2514-22/+22
| | | | | | | | | | | | | | | | Basically a sed but shaderapi.c and get.c. get.c => GL_CURRENT_PROGAM always refer to the "old" UseProgram behavior shaderapi.c => the old api stil update the Shader object directly V2: formatting improvement V3 (idr): * Rebase fixes after a block of code was moved from ir_to_mesa.cpp to shaderapi.c. * Trivial reformatting. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: For color clears, only disable writes to components that exist.Kenneth Graunke2014-03-241-1/+2
| | | | | | | | | | | | | | | | | | | | | | | The SIMD16 replicated FB write message only works if we don't need the color calculator to mask our framebuffer writes. Previously, we bailed on it if color_mask wasn't <true, true, true, true>. However, this was needlessly strict for formats with fewer than four components - only the components that actually exist matter. WebGL Aquarium attempts to clear a BGRX texture with the ColorMask set to <true, true, true, false>. This will work perfectly fine with the replicated data message; we just bailed unnecessarily. Improves performance of WebGL Aquarium on Iris Pro (at 1920x1080) by abound 50%, and Bay Trail (at 1366x768) by over 70% (using Chrome 24). v2: Use _mesa_format_has_color_component() to properly handle ALPHA formats (and generally be less fragile). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Tested-by: Dylan Baker <[email protected]>
* i965: Fix compiler warning about signed/unsigned.Eric Anholt2014-03-241-1/+1
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/gen8: Change the winsys MSAA blits from blorp to meta.Eric Anholt2014-03-244-8/+152
| | | | | | | | | | | | | | | | | | This gets us equivalent code paths on BDW and pre-BDW, except for stencil (where we don't have MSAA stencil resolve code yet) Improves MSAA-forced citybench by 7.94496% +/- 2.38429% (n=16). Reduces DRI2 MSAA glxgears performance by -12.3559% +/- 1.52845% (n=9). v2: Move the new meta code to brw_meta_updownsample.c, name it brw_meta_updownsample(), add a comment about intel_rb_storage_first_mt_slice(), and rename that function and move the RB generation into it (review ideas by Ken). v3: Fix 2 src vs dst pasteos in previous change. v4: Skip this path pre-gen8 for now, until we can analyze the glxgears performance delta some more. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Skip reallocating the private MSAA miptree, unless it's resized.Eric Anholt2014-03-241-17/+28
| | | | | | | | | | | Even if the singlesample_mt got reopened from DRI due to pageflipping/buffer swapping, our private miptree shouldn't need any changes. Improves performance of a little swapbuffers-loving microbenchmark with MSAA forced on, by 1.2371% +/- 0.624802% (n=102) Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Simplify the no-reopening-the-winsys-buffer tests.Eric Anholt2014-03-241-22/+16
| | | | | | | The formatting was weird, and the tests were duplicated, and it is guaranteed that mt->region exists. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Don't forget to free the old singlesample_mt.Eric Anholt2014-03-241-0/+1
| | | | | | | Fixes a memory leak with MSAA winsys buffers since my move of singlesample_mt to the rb in 4e0924c5de5f3964e4ca81f923d877dbb59fad0a Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add an env var for forcing window system MSAA.Eric Anholt2014-03-242-0/+17
| | | | | | | | Sometimes it would be nice to benchmark some app with MSAA versus not, but it doesn't offer the controls you want. Just provide a handy knob to force the issue. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: Eliminate dead writes to the flag register.Matt Turner2014-03-241-18/+48
| | | | | | | | | | | For each write, search previous instructions for unread writes to the flag register and remove them. Note that this will not eliminate the last unread write. total instructions in shared programs: 788074 -> 788004 (-0.01%) instructions in affected programs: 4930 -> 4860 (-1.42%) Reviewed-by: Eric Anholt <[email protected]>
* i965/vec4: Eliminate writes that are never read.Matt Turner2014-03-241-0/+46
| | | | | | | | | | With an awful O(n^2) algorithm that searches previous instructions for dead writes. total instructions in shared programs: 805582 -> 788074 (-2.17%) instructions in affected programs: 144561 -> 127053 (-12.11%) Reviewed-by: Eric Anholt <[email protected]>
* i965/vec4: Factor code out of DCE into a separate function.Matt Turner2014-03-241-34/+39
| | | | | | Will be reused in the next commit. Reviewed-by: Eric Anholt <[email protected]>
* i965/vec4: Let dead code eliminate trim dead channels.Matt Turner2014-03-241-3/+26
| | | | | | | | | | | | | | | | | That is, modify mad dst, a, b, c to be mad dst.xyz, a, b, c if dst.w is never read. total instructions in shared programs: 811869 -> 805582 (-0.77%) instructions in affected programs: 168287 -> 162000 (-3.74%) Reviewed-by: Eric Anholt <[email protected]>
* i965/vec4: Track live ranges per-channel, not per vgrf.Matt Turner2014-03-242-14/+41
| | | | | | Will be squashed with the next patch. Reviewed-by: Eric Anholt <[email protected]>
* i965/vec4: Don't dead code eliminate instructions writing the flag.Matt Turner2014-03-241-1/+5
| | | | | | | | | | A future patch adds support for removing dead writes to the flag register. This patch simplifies the logic until then. total instructions in shared programs: 811813 -> 811869 (0.01%) instructions in affected programs: 3378 -> 3434 (1.66%) Reviewed-by: Eric Anholt <[email protected]>
* i965/vec4: Preparatory clean up of dead_code_eliminate().Matt Turner2014-03-241-22/+23
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965/vec4: Add is_null() method to dst_reg.Matt Turner2014-03-242-0/+10
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965/vec4: Print the predicate in dump_instructions().Matt Turner2014-03-241-0/+5
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965/vec4: Rename depends_on_flags() to reads_flag().Matt Turner2014-03-242-3/+3
| | | | | | To be consistent with the fs backend. Reviewed-by: Eric Anholt <[email protected]>
* i965/vec4: Add and use vec4_instruction::writes_flag().Matt Turner2014-03-242-2/+7
| | | | | | | | To be consistent with the fs backend. Also the instruction scheduler incorrectly considered SEL with a conditional modifier to read the flag register. Reviewed-by: Eric Anholt <[email protected]>
* i965/vec4: Add missing doxygen close brace.Matt Turner2014-03-241-0/+1
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965: Report the type of color clear in INTEL_DEBUG=blorp.Kenneth Graunke2014-03-231-2/+9
| | | | | | | | | It's useful to know whether a clear is fast (MCS-based), using the SIMD16 repdata message, or slow. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* Revert "i965: For color clears, only disable writes to components that exist."Kenneth Graunke2014-03-211-1/+1
| | | | | | | | This reverts commit 2919c3fdb40cf457f2e47f378a46f4cefa9e9f6d. For formats like BGRX, looping through 0..num_components works fine. But for formats like XRGB, we'd check the color mask for X and fail to check it for B.
* i965: For color clears, only disable writes to components that exist.Kenneth Graunke2014-03-211-1/+1
| | | | | | | | | | | | | | | | | | | | The SIMD16 replicated FB write message only works if we don't need the color calculator to mask our framebuffer writes. Previously, we bailed on it if color_mask wasn't <true, true, true, true>. However, this was needlessly strict for formats with fewer than four components - only the components that actually exist matter. WebGL Aquarium attempts to clear a BGRX texture with the ColorMask set to <true, true, true, false>. This will work perfectly fine with the replicated data message; we just bailed unnecessarily. Improves performance of WebGL Aquarium on Iris Pro (at 1920x1080) by abound 40%, and Bay Trail (at 1366x768) by over 70% (using Chrome 24). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Paul Berry <[email protected]> Tested-by: Dylan Baker <[email protected]>
* i965: Print number of multisamples in INTEL_DEBUG=blorp output.Kenneth Graunke2014-03-211-4/+4
| | | | | | | | This lets us distinguish MSAA resolves from other ordinary blits. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: Drop BLT TexSubImage Y-tiling restriction on Gen6+.Kenneth Graunke2014-03-211-2/+2
| | | | | | | | | | Currently, we don't use this path on Sandybridge because we suspect other paths will be faster. But we potentially could. If we do, we should allow it to support Y-tiled BLTs. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Enable ARB_vertex_type_10f_11f_11f_rev for Gen4/5 also.Chris Forbes2014-03-221-1/+1
| | | | | | | Tested on ILK and CTG (with the GL3isms taken out of the piglits). Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nouveau: don't assume libdrm include prefixJonathan Gray2014-03-201-1/+1
| | | | | | | drm headers may be installed in a different directory Signed-off-by: Jonathan Gray <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nouveau: there may not have been a texture if the fbo was incompleteIlia Mirkin2014-03-191-1/+2
| | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> Cc: "10.0 10.1" <[email protected]>
* nouveau: add forgotten GL_COMPRESSED_INTENSITY to texture format listIlia Mirkin2014-03-191-0/+1
| | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> Cc: "10.0 10.1" <[email protected]>
* i965: Drop some more dead code from the old CACHED_BATCH feature.Eric Anholt2014-03-184-38/+0
| | | | | Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Drop special case for edgeflag thanks to Marek's change to core.Eric Anholt2014-03-181-9/+0
| | | | | | | As of 780ce576bb1781f027797039693b98253ee4813e, we end up with R8_SSCALED anyway. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Enable EWA anisotropic filtering algorithmIan Romanick2014-03-181-0/+1
| | | | | | | | | | | Volume 4, part 1 of the Ivybridge PRM says, "Generally, the EWA approximation algorithm results in higher image quality than the legacy algorithm." Using a classic anisotropic filtering "tunnel" demo, it appears that there is *no* anisotropic filtering on IVB without this bit set. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Actually initialize simd16_unsupported and no16_msg.Kenneth Graunke2014-03-181-0/+2
| | | | | | | | I meant to include this fixes in v3 of commit de7ad2c88f4ec243c95eaed22c41d0e537912e01, but accidentally pushed a previous version. Signed-off-by: Kenneth Graunke <[email protected]>
* i965/upload: Refactor open-coded ALIGN-like computations.Kenneth Graunke2014-03-181-3/+9
| | | | | | | | Sadly, we can't use actual ALIGN(), since that only supports power-of-two values for the alignment parameter. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Fix indentation in brw_upload_indices().Kenneth Graunke2014-03-181-19/+19
| | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Consolidate code for setting brw->ib.start_vertex_offset.Kenneth Graunke2014-03-181-9/+6
| | | | | | | This was set identically in three places. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Allocate register sets at screen creation, not context creation.Kenneth Graunke2014-03-186-88/+88
| | | | | | | | | | | | | | Register sets depend on the particular hardware generation, but don't depend on anything in the actual OpenGL context. Computing them is fairly expensive, and they take up a large amount of memory. Putting them in the screen allows us to compute/allocate them once for all contexts, saving both time and space. Improves the performance of a context creation/destruction microbenchmark by about 3x on my Haswell i7-4750HQ. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Allocate the screen using ralloc rather than calloc.Kenneth Graunke2014-03-181-2/+3
| | | | | | | This will allow us to use the screen as a memory context. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Accurately bail on SIMD16 compiles.Kenneth Graunke2014-03-183-34/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ideally, we'd like to never even attempt the SIMD16 compile if we could know ahead of time that it won't succeed---it's purely a waste of time. This is especially important for state-based recompiles, which happen at draw time. The fragment shader compiler has a number of checks like: if (dispatch_width == 16) fail("...some reason..."); This patch introduces a new no16() function which replaces the above pattern. In the SIMD8 compile, it sets a "SIMD16 will never work" flag. Then, brw_wm_fs_emit can check that flag, skip the SIMD16 compile, and issue a helpful performance warning if INTEL_DEBUG=perf is set. (In SIMD16 mode, no16() calls fail(), for safety's sake.) The great part is that this is not a heuristic---if the flag is set, we know with 100% certainty that the SIMD16 compile would fail. (It might fail anyway if we run out of registers, but it's always worth trying.) v2: Fix missing va_end in early-return case (caught by Ilia Mirkin). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]> [v1] Reviewed-by: Ian Romanick <[email protected]> [v1] Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Support pull parameters in SIMD16 mode.Kenneth Graunke2014-03-182-11/+13
| | | | | | | | | | | This is just a matter of reusing the pull/push constant information set up by the SIMD8 compile. This gains us 78 SIMD16 programs in shader-db. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Use a single instance of the pull_constant_loc[] array.Kenneth Graunke2014-03-182-28/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we don't renumber uniform registers, assign_constant_locations and move_uniform_array_access_to_pull_constants use the same names. So, they can share a single copy of the pull_constant_loc[] array. This simplifies the code considerably. assign_constant_locations() doesn't need to walk through pull_params[] to rediscover reladdr demotions; it just has that information in pull_constant_loc[]. We also only need to rewrite the instruction stream once, instead of twice. Even better, we now have a single array describing the layout of all pull parameters, which we can pass to the SIMD16 program. This actually hurts a few shaders in Serious Sam 3, and one in KWin: total instructions in shared programs: 1841957 -> 1842035 (0.00%) instructions in affected programs: 1165 -> 1243 (6.70%) Comparing dump_instructions() before and after the pull constant transformations with and without this patch, it appears that there is a uniform array with variable indexing (reladdr) and constant indexing (of array element 0). Previously, we uploaded array element 0 as both a pull constant (for reladdr) /and/ a push constant. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Don't renumber UNIFORM registers.Kenneth Graunke2014-03-183-118/+86
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, remove_dead_constants() would renumber the UNIFORM registers to be sequential starting from zero, and the resulting register number would be used directly as an index into the params[] array. This renumbering made it difficult to collect and save information about pull constant locations, since setup_pull_constants() and move_uniform_array_access_to_pull_constants() used different names. This patch generalizes setup_pull_constants() to decide whether each uniform register should be a pull constant, push constant, or neither (because it's unused). Then, it stores mappings from UNIFORM register numbers to params[] or pull_params[] indices in the push_constant_loc and pull_constant_loc arrays. (We already did this for pull constants.) Then, assign_curb_setup() just needs to consult the push_constant_loc array to get the real index into the params[] array. This effectively folds all the remove_dead_constants() functionality into assign_constant_locations(), while being less irritable to work with. v2: Add assert(remapped <= i), requested by Topi. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Eric Anholt <[email protected]>