summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers
Commit message (Collapse)AuthorAgeFilesLines
* i965/fs: Allow constant propagation between different typesJason Ekstrand2015-01-131-2/+2
| | | | | | | | | | | | | | This will be needed for NIR because it is typeless and treats all constants as uint32 values and reinterprets them when they are used later. This commit allows those values to be properly propagated. Also, this helps some synmark shaders because it allows us to copy propagate a 0x00000000UD into a 0.0F in a load_payload, which then lets us combine 4 load_payloads. instructions in affected programs: 2288 -> 2144 (-6.29%) Reviewed-by: Matt Turner <[email protected]>
* i965: Fix bitcast operations with negate (ceil)Iago Toral Quiroga2015-01-132-8/+14
| | | | | | | | | | | | | | | | | | | | | | Commit 0ae9ca12a8 put source modifiers out of the bitcast operations by adding a MOV operation that would handle them separately. It missed the case of ceil though: the implementation negates both its source and destination operands. The source operand will be used for RNDD, which we can handle normally, but we need to fix the modifier for the negated result. v2: - RNDD can handle the source modifier so no need to put that one in a separate MOV. Fixes the following 42 dEQP tests: dEQP-GLES3.functional.shaders.builtin_functions.common.ceil.*_vertex dEQP-GLES3.functional.shaders.builtin_functions.common.ceil.*_fragment dEQP-GLES3.functional.shaders.builtin_functions.precision.ceil._*vertex.* dEQP-GLES3.functional.shaders.builtin_functions.precision.ceil._*fragment.* Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Sets missing vertex shader constant values for HighInt formatEduardo Lima Mitev2015-01-131-0/+6
| | | | | | | | | The range's min and max, and the precision value are not set correctly for the vertex shader constants. Fixes 1 dEQP test: dEQP-GLES3.functional.state_query.shader.precision_vertex_highp_int Reviewed-by: Ian Romanick <[email protected]>
* i965: Respect the no_8 flag on Gen6, not just Gen7+.Kenneth Graunke2015-01-121-2/+2
| | | | | | | | | | | | | | | When doing repclears, we only want to use the SIMD16 program, not the SIMD8 one. Kristian added this to the Gen7+ code, but apparently we missed it in the Gen6 code. This patch copies that code over. Approximately doubles the performance in a clear microbenchmark from mesa-demos (clearspd -width 500 -height 500 +color) on Sandybridge. Cc: "10.4 10.3" <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Matt Turner <[email protected]> References: https://code.google.com/p/chrome-os-partner/issues/detail?id=34681
* i965: Consider SEL.{GE,L} to be commutative operations.Matt Turner2015-01-082-10/+27
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/cfg: Fix end_ip of last basic block.Matt Turner2015-01-081-1/+1
| | | | | | | start_ip and end_ip are inclusive. Increases instruction counts in 64 shaders in shader-db, likely indicative of them previously being misoptimized.
* main: Renamed _mesa_get_compressed_teximage to _mesa_GetCompressedTexImage_sw.Laura Ekstrand2015-01-081-1/+1
| | | | | | | | This reflects the new naming convention for software fallbacks. To avoid confusion with ARB_DIRECT_STATE_ACCESS backend functions, software fallbacks now have the form _mesa_[Driver function name]_sw. Reviewed-by: Anuj Phogat <[email protected]>
* main: Renamed _mesa_get_teximage to _mesa_GetTexImage_sw.Laura Ekstrand2015-01-081-1/+1
| | | | | | | | This reflects the new naming convention for software fallbacks. To avoid confusion with ARB_DIRECT_STATE_ACCESS backend functions, software fallbacks now have the form _mesa_[Driver function name]_sw. Reviewed-by: Anuj Phogat <[email protected]>
* main: Changed _mesa_alloc_texture_storage to _mesa_AllocTextureStorage_sw.Laura Ekstrand2015-01-082-2/+2
| | | | | | | | | | | | | | In order to implement ARB_DIRECT_STATE_ACCESS, many GL API functions must now rely on a backend that both traditional and DSA functions can use. For instance, _mesa_TexStorage2D and _mesa_TextureStorage2D both call a backend function _mesa_texture_storage that takes a context and a texture object as arguments. The backend is named _mesa_texture_storage so that Meta can call it and avoid looking up the context and the texture object. However, backend names often look very close to the names of software fallbacks (ie. _mesa_alloc_texture_storage). For this reason, software fallbacks have been renamed for clarity to have the form _mesa_[Driver function name]_sw. Reviewed-by: Anuj Phogat <[email protected]>
* main: Moved _mesa_lock_texture and _mesa_unlock_texture to texobj.h from ↵Laura Ekstrand2015-01-082-0/+2
| | | | | | teximage.h. Reviewed-by: Anuj Phogat <[email protected]>
* i965: blit_texture_to_pbo() now accepts TEXTURE_CUBE_MAP.Laura Ekstrand2015-01-081-0/+1
| | | | | | ARB_DIRECT_STATE_ACCESS permits the user to use TEXTURE_CUBE_MAP as a target. Reviewed-by: Anuj Phogat <[email protected]>
* i965/skl: Always use a header for SIMD4x2 sampler messagesKristian Høgsberg2015-01-085-11/+54
| | | | | | | | | | | | SKL+ overloads the SIMD4x2 SIMD mode to mean either SIMD8D or SIMD4x2 depending on bit 22 in the message header. If the bit is 0 or there is no header we get SIMD8D. We always wand SIMD4x2 in vec4 and for fs pull constants, so use a message header in those cases and set bit 22 there. Based on an initial patch from Ken. Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Kristian Høgsberg <[email protected]>
* i965/skl: Report more accurate number of samples for formatKristian Høgsberg2015-01-071-0/+2
| | | | | Signed-off-by: Kristian Høgsberg <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* mesa: remove unused ctx parameter for _mesa_select_tex_image()Brian Paul2015-01-052-4/+4
| | | | Reviewed-by: Eric Anholt <[email protected]>
* meta: init var to silence uninitialized variable warningBrian Paul2015-01-051-1/+1
|
* i965: Micro-optimize swizzle_to_scs() and make it inlinable.Kenneth Graunke2015-01-043-27/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | brw_swizzle_to_scs has been showing up in my CPU profiling, which is rather silly - it's a tiny amount of code. It really should be inlined, and can easily be implemented with fewer instructions. The enum translation is as follows: SWIZZLE_X, SWIZZLE_Y, SWIZZLE_Z, SWIZZLE_W, SWIZZLE_ZERO, SWIZZLE_ONE 0 1 2 3 4 5 4 5 6 7 0 1 SCS_RED, SCS_GREEN, SCS_BLUE, SCS_ALPHA, SCS_ZERO, SCS_ONE which is simply (swizzle + 4) & 7. Haswell needs extra textureGather workarounds to remap GREEN to BLUE, but Broadwell and later do not. This patch replicates swizzle_to_scs in gen7_wm_surface_state.c and gen8_surface_state.c, since the Gen8+ code can be simplified to a mere two instructions. Both copies can be marked static for easy inlining. v2: Put the commit message in the code as comments (requested by Jason Ekstrand). Also fix a typo. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Support MESA_FORMAT_R8G8B8X8_SRGB.Kenneth Graunke2015-01-041-1/+4
| | | | | | | | | | | | | | | | | | | | | Valve games use GL_SRGB8 textures. Instead of supporting that properly, we fell back to MESA_FORMAT_R8G8B8A8_SRGB (with an alpha channel), which meant that we had to use texture swizzling to override the alpha to 1.0 when sampling. This meant shader recompiles on Gen < 7.5 platforms. By supporting MESA_FORMAT_R8G8B8X8_SRGB, the hardware just returns 1.0 for us, so we can just use SWIZZLE_XYZW, and avoid any recompiles. All generations of hardware have supported the format for sampling and filtering; we can easily support rendering by using the R8G8B8A8_SRGB format and writing garbage to the X channel. (We do this already for the non-SRGB version of this format.) This removes all remaining shader recompiles in a time demo of "Counter Strike: Global Offensive" (32 -> 0) on Sandybridge. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87886 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Fix BLORP sRGB MSAA overrides to cope with X vs. A formats.Kenneth Graunke2015-01-041-2/+3
| | | | | | | | | | | | | The logic in brw_blorp_surface_info::set uses brw_format_for_mesa_format for source surfaces, and brw->render_target_format[] for destination surfaces. We should do the same in the sRGB MSAA overrides. Currently, this isn't a problem, since SRGB MSAA buffers are all RGBA. The next commit will introduce RGBX SRGB MSAA buffers, at which point we need to get the RGBX -> RGBA format overrides for rendering right. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Copy shader->shadow_samplers to prog->ShadowSamplers.Kenneth Graunke2015-01-041-0/+1
| | | | | | | | | | | | | | | | ir_to_mesa does this - apparently we just forgot or something. Without this, we'll guess the wrong texture swizzle (XYZW for color instead of XXX1 for depth) when doing precompiles. This cuts 26 shader recompiles in a time demo of "Counter Strike: Global Offensive" (58 -> 32) on Sandybridge. Haswell still has 0 recompiles. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87886 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Make the precompile ignore DEPTH_TEXTURE_MODE on Gen7.5+.Kenneth Graunke2015-01-042-2/+5
| | | | | | | | | | | | | | | Gen7.5+ platforms that support the "Shader Channel Select" feature leave key->tex.swizzles[i] as SWIZZLE_NOOP except when GL_DEPTH_TEXTURE_MODE is GL_ALPHA (which is really uncommon). So, the precompile should leave them as SWIZZLE_NOOP (aka SWIZZLE_XYZW) as well. We didn't notice this because prog->ShadowSamplers is not set correctly. The next patch will fix that problem. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87886 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Implement WaCsStallAtEveryFourthPipecontrol on IVB/BYT.Kenneth Graunke2015-01-042-0/+34
| | | | | | | | | | | | According to the documentation, we need to do a CS stall on every fourth PIPE_CONTROL command to avoid GPU hangs. The kernel does a CS stall between batches, so we only need to count the PIPE_CONTROLs in our batches. v2: Get the generation check right (caught by Chris Wilson), combine the ++ with the check (suggested by Daniel Vetter). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Daniel Vetter <[email protected]>
* i965: Make INTEL_DEBUG=state ignore state flags with a count of 1.Kenneth Graunke2015-01-031-2/+4
| | | | | | | | | | | There are too many state flags to fit in one terminal screen, even with a very tall terminal. Everything is flagged once, so a value of 1 means that it hasn't ever happened again, and thus isn't terribly interesting. Skipping those makes it easier to see the interesting values. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Fix INTEL_DEBUG=optimizer with VF types.Kenneth Graunke2015-01-032-2/+2
| | | | | | | Hardcoding stderr is wrong; INTEL_DEBUG=optimizer uses other files. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Show opt_vector_float() and later passes in INTEL_DEBUG=optimizer.Kenneth Graunke2015-01-031-8/+12
| | | | | | | | | | | | In order to support calling opt_vector_float() inside a condition, this patch makes OPT() a statement expression: https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html We've used that elsewhere already. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* swrast: Fix -Wduplicate-decl-specifier warningJeremy Huddleston Sequoia2015-01-011-2/+2
| | | | | | | | | | | swrast.c:67:12: warning: duplicate 'const' declaration specifier [-Wduplicate-decl-specifier] const char const *swrast_vendor_string = "Mesa Project"; ^ swrast.c:68:12: warning: duplicate 'const' declaration specifier [-Wduplicate-decl-specifier] const char const *swrast_renderer_string = "Software Rasterizer"; ^ Signed-off-by: Jeremy Huddleston Sequoia <[email protected]>
* i965: Fix start/base_vertex_location for >1 prims but !BRW_NEW_VERTICES.Kenneth Graunke2014-12-313-21/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a partial revert of c89306983c07e5a88c0d636267e5ccf263cb4213. It split the {start,base}_vertex_location handling into several steps: 1. Set brw->draw.start_vertex_location = prim[i].start and brw->draw.base_vertex_location = prim[i].basevertex. (This happened once per _mesa_prim, in the main drawing loop.) 2. Add brw->vb.start_vertex_bias and brw->ib.start_vertex_offset appropriately. (This happened in brw_prepare_shader_draw_parameters, which was called just after brw_prepare_vertices, as part of state upload, and only happened when BRW_NEW_VERTICES was flagged.) 3. Use those values when emitting 3DPRIMITIVE (once per _mesa_prim). If we drew multiple _mesa_prims, but didn't flag BRW_NEW_VERTICES on the second (or later) primitives, we would do step #1, but not #2. The first _mesa_prim would get correct values, but subsequent ones would only get the first half of the summation. The reason I originally did this was because I needed the value of gl_BaseVertexARB to exist in a buffer object prior to uploading 3DSTATE_VERTEX_BUFFERS. I believed I wanted to upload the value of 3DPRIMITIVE's "Base Vertex Location" field, which was computed as: (prims[i].indexed ? prims[i].start : prims[i].basevertex) + brw->vb.start_vertex_bias. The latter value wasn't available until after brw_prepare_vertices, and the former weren't available in the state upload code at all. Hence the awkward split. However, I believe that including brw->vb.start_vertex_bias was a mistake. It's an extra bias we apply when uploading vertex data into VBOs, to move [min_index, max_index] to [0, max_index - min_index]. >From the GL_ARB_shader_draw_parameters specification: "<gl_BaseVertexARB> holds the integer value passed to the <baseVertex> parameter to the command that resulted in the current shader invocation. In the case where the command has no <baseVertex> parameter, the value of <gl_BaseVertexARB> is zero." I conclude that gl_BaseVertexARB should only include the baseVertex parameter from glDraw*Elements*, not any internal biases we add for optimization purposes. With that in mind, gl_BaseVertexARB only needs prim[i].start or prim[i].basevertex. We can simply store that, and go back to computing start_vertex_location and base_vertex_location in brw_emit_prim(), like we used to. This is much simpler, and should actually fix two bugs. Fixes missing geometry in Unvanquished. Cc: "10.4 10.3" <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85529 Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Ian Romanick <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Use WARN_ONCE for the single-primitive-exceeded-aperture message.Kenneth Graunke2014-12-311-9/+4
| | | | | | | This makes it show up via ARB_debug_output and is also less code. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/vec4: Do separate copy followed by constant propagation after ↵Matt Turner2014-12-291-1/+2
| | | | | | | | | | | | | | | | | opt_vector_float(). total instructions in shared programs: 5877012 -> 5876617 (-0.01%) instructions in affected programs: 33140 -> 32745 (-1.19%) From before the commit that allows VF constant propagation (which hurt some programs) to here, the results are: total instructions in shared programs: 5877951 -> 5876617 (-0.02%) instructions in affected programs: 123444 -> 122110 (-1.08%) with no programs hurt. Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: Allow constant propagation of VF immediates.Matt Turner2014-12-291-1/+27
| | | | | | | | | | total instructions in shared programs: 5877951 -> 5877012 (-0.02%) instructions in affected programs: 155923 -> 154984 (-0.60%) Helps 1233, hurts 156 shaders. The hurt shaders are addressed in the next commit. Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: Add parameter to skip doing constant propagation.Matt Turner2014-12-292-3/+3
| | | | | | | | | | | | | | | | After CSEing some MOV ..., VF instructions we have code like mov tmp, [1F, 2F, 3F, 4F]VF mov r10, tmp mov r11, tmp ... use r10 use r11 We want to copy propagate tmp into the uses of r10 and r11, but *not* constant propagate the VF immediate into the uses of tmp. Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: Do CSE, copy propagation, and DCE after opt_vector_float().Matt Turner2014-12-291-1/+5
| | | | | | | total instructions in shared programs: 5869005 -> 5868220 (-0.01%) instructions in affected programs: 70208 -> 69423 (-1.12%) Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: Perform CSE on MOV ..., VF instructions.Matt Turner2014-12-291-5/+11
| | | | | | | | Port of commit a28ad9d4 from the fs backend. No shader-db changes since we don't emit MOV ..., VF instructions yet. Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: Add pass to gather constants into a vector-float MOV.Matt Turner2014-12-292-0/+62
| | | | | | | | | | Currently only handles consecutive instructions with the same destination that collectively write all channels. total instructions in shared programs: 5879798 -> 5869011 (-0.18%) instructions in affected programs: 465236 -> 454449 (-2.32%) Reviewed-by: Ian Romanick <[email protected]>
* i965: Add support for saturating immediates.Matt Turner2014-12-294-0/+80
| | | | | | | I don't feel great about assert(!"unimplemented: ...") but these cases do only seem possible under some currently impossible circumstances. Reviewed-by: Ian Romanick <[email protected]>
* i965: Add fs_reg/src_reg constructors that take vf[4].Matt Turner2014-12-293-0/+19
| | | | | | | | | | Sometimes it's easier to generate 4x values into an array, and the memcpy is 1 instruction, rather than 11 to piece 4 arguments together. I'd forgotten to remove the prototype from fs_reg from a previous patch, so it's already there for us here. Reviewed-by: Ian Romanick <[email protected]>
* i965: Add missing BRW_NEW_*_PROG_DATA to texture/renderbuffer atoms.Kenneth Graunke2014-12-241-2/+10
| | | | | | | | | | | | | This was probably missed when moving from a fixed binding table layout to a dynamic one that changes based on the shader. Fixes newly proposed Piglit test fbo-mrt-new-bind. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87619 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Reviewed-by: Mike Stroyan <[email protected]> Cc: "10.4 10.3" <[email protected]>
* i965: Cache register write capability checks.Kenneth Graunke2014-12-241-0/+12
| | | | | | | | | | | | | | | | | | | Our ability to perform register writes depends on the hardware and kernel version. It shouldn't ever change on a per-context basis, so we only need to check once. Checking introduces a synchronization point between the CPU and GPU: even though we submit very few GPU commands, the GPU might be busy doing other work, which could cause us to stall for a while. On an idle i7 4750HQ, this improves performance in OglDrvCtx (a context creation microbenchmark) by 6.14748% +/- 1.6837% (n=20). With Unigine Valley running in the background (to keep the GPU busy), it improves performance in OglDrvCtx by 2290.92% +/- 29.5274% (n=5). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ben Widawsky <[email protected]>
* i965: Use safer pointer arithmetic in gather_oa_results()Chad Versace2014-12-221-1/+1
| | | | | | | | | | | | | | | | | | This patch reduces the likelihood of pointer arithmetic overflow bugs in gather_oa_results(), like the one fixed by b69c7c5dac. I haven't yet encountered any overflow bugs in the wild along this patch's codepath. But I get nervous when I see code patterns like this: (void*) + (int) * (int) I smell 32-bit overflow all over this code. This patch retypes 'snapshot_size' to 'ptrdiff_t', which should fix any potential overflow. Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Chad Versace <[email protected]>
* i965: Use safer pointer arithmetic in intel_texsubimage_tiled_memcpy()Chad Versace2014-12-221-3/+4
| | | | | | | | | | | | | | | | | | This patch reduces the likelihood of pointer arithmetic overflow bugs in intel_texsubimage_tiled_memcpy() , like the one fixed by b69c7c5dac. I haven't yet encountered any overflow bugs in the wild along this patch's codepath. But I recently solved, in commit b69c7c5dac, an overflow bug in a line of code that looks very similar to pointer arithmetic in this function. This patch conceptually applies the same fix as in b69c7c5dac. Instead of retyping the variables, though, this patch adds some casts. (I tried to retype the variables as ptrdiff_t, but it quickly got very messy. The casts are cleaner). Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Chad Versace <[email protected]>
* i965: Fix intel_miptree_map() signature to be more 64-bit safeChad Versace2014-12-225-10/+24
| | | | | | | | | | | | | | | | | This patch should diminish the likelihood of pointer arithmetic overflow bugs, like the one fixed by b69c7c5dac. Change the type of parameter 'out_stride' from int to ptrdiff_t. The logic is that if you call intel_miptree_map() and use the value of 'out_stride', then you must be doing pointer arithmetic on 'out_ptr'. Using ptrdiff_t instead of int should make a little bit harder to hit overflow bugs. As a side-effect, some function-scope variables needed to be retyped to avoid compilation errors. Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Chad Versace <[email protected]>
* i965: Remove spurious casts in copy_image_with_memcpy()Chad Versace2014-12-221-4/+4
| | | | | | | | If a pointer points to raw, untyped memory and is never dereferenced, then declare it as 'void*' instead of casting it to 'void*'. Signed-off-by: Chad Versace <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Add missing const qualifier.Matt Turner2014-12-191-1/+1
|
* mesa: Remove unnecessary -f from $(RM).Matt Turner2014-12-171-3/+3
| | | | $(RM) includes -f.
* i965: Require pixel alignment for GPU copy blitCody Northrop2014-12-162-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | The blitter will start at a pixel's natural alignment. For PBOs, if the provided offset if not aligned, bits will get dropped. This change adds offset alignment check for src and dst, kicking back if the requirements are not met. The change is based on following verbiage from BSPEC: Color pixel sizes supported are 8, 16, and 32 bits per pixel (bpp). All pixels are naturally aligned. Found in the following locations: page 35 of intel-gfx-prm-osrc-hsw-blitter.pdf page 29 of ivb_ihd_os_vol1_part4.pdf page 29 of snb_ihd_os_vol1_part5.pdf This behavior was observed with Steam Big Picture rendering incorrect icon colors. The fix has been tested on Ubuntu and SteamOS on Haswell. Signed-off-by: Cody Northrop <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83908 Reviewed-by: Neil Roberts <[email protected]>
* i965: remove includes of sampler.h from extern "C" blocksMark Janes2014-12-164-5/+4
| | | | | | | | | C linkage was removed from functions in program/sampler.cpp. However, some cpp files include program/sampler.h within extern "C" blocks, causing link errors for test_vec4_copy_propagation. Reviewed-by: Brian Paul <[email protected]> Tested-by: Ian Romanick <[email protected]>
* i965/query: Cache whether the batch references the query BO.Kenneth Graunke2014-12-162-4/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Chris Wilson noted that repeated calls to CheckQuery() would call drm_intel_bo_references(brw->batch.bo, query->bo) on each invocation, which is expensive. Once we've flushed, we know that future batches won't reference query->bo, so there's no point in asking more than once. This patch adds a brw_query_object::flushed flag, which is a conservative estimate of whether the batch has been flushed. On the first call to CheckQuery() or WaitQuery(), we check if the batch references query->bo. If not, it must have been flushed for some reason (such as being full). We record that it was flushed. If it does reference query->bo, we explicitly flush, and record that we did so. Any subsequent checks will simply see that query->flushed is set, and skip the drm_intel_bo_references() call. Inspired by a patch from Chris Wilson. According to Eero, this does not affect the performance of Witcher 2 on Haswell, but approximately halves the userspace CPU usage. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86969 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ben Widawsky <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/query: Use brw_bo_map to handle stall warnings.Kenneth Graunke2014-12-161-7/+1
| | | | | | | | | | This is less code and also measures the duration of the stall for us. Our old code predates the existance of brw_bo_map(). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ben Widawsky <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/query: Remove redundant drm_intel_bo_references call in CheckQuery.Kenneth Graunke2014-12-161-7/+8
| | | | | | | | | | | | | | | | | | | | | | CheckQuery calls drm_intel_bo_references to see if the batch references the query BO, and if so, flushes. It then checks if the query BO is busy, and if not, calls gen6_queryobj_get_results(). Stupidly, gen6_queryobj_get_results() immediately did a second redundant drm_intel_bo_references check, even though we know the buffer is not referenced and in fact idle. This patch moves the batch-flush check out of gen6_queryobj_get_results and into WaitQuery() (the other caller). That way, both callers do a single batch-flush check. This should only be a minor improvement, since it would only affect the first CheckQuery call where the result is actually available. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86969 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ben Widawsky <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/query: Add query->bo == NULL early return in CheckQuery hook.Kenneth Graunke2014-12-161-2/+8
| | | | | | | | | | | If query->bo == NULL, this is a redundant CheckQuery call, and we should simply return. We didn't do anything anyway - we skipped the batch flushing block, and although we called get_results(), it has an early return and does nothing. Why bother? Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ben Widawsky <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/query: Set Ready flag in gen6_queryobj_get_results().Kenneth Graunke2014-12-161-2/+2
| | | | | | | | | | | | | | | q->Ready means that the results are in, and core Mesa is free to return them to the application. gen6_queryobj_get_results() is a natural place to set that flag; doing so means callers don't have to. The older non-hardware-context aware code couldn't do this, because we had to call brw_queryobj_get_results() to gather intermediate results when we ran out of space for snapshots in the query buffer. We only gather complete results in the Gen6+ code, however. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ben Widawsky <[email protected]> Reviewed-by: Ian Romanick <[email protected]>