aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965
Commit message (Collapse)AuthorAgeFilesLines
* i965: Drop pointless check for variable declarations in splitting.Eric Anholt2014-04-081-10/+5
| | | | | | | We're walking the whole instruction stream, so we know the declaration will be found. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Remove stale comment.Eric Anholt2014-04-081-1/+0
| | | | | | | | We stopped doing variable index lowering for uniforms in a64c1eb9b110f29b8abf803a8256306702629bdc, 5 months after the comment was added. Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: Pass ctx->Const.NativeIntegers to do_common_optimization().Kenneth Graunke2014-04-081-1/+2
| | | | | | | | | | | The next few patches will introduce an optimization that only works when integers are not represented as floating point values. v2: Re-word-wrap a line, as requested by Ian Romanick. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Skip emitting MACH/MOV for small integers.Kenneth Graunke2014-04-081-12/+21
| | | | | | | | | | | | | | | | | The vector backend already implemented this optimization, but surprisingly, we never bothered to implement it in the scalar backend. In addition to saving two instructions, this eliminates a use of the accumulator as an explicit source, which is unsupported in SIMD16 mode on Gen7+, which could help us gain SIMD16 programs. Cuts 19.23% of the instructions in dolphin/efb2ram.shader_test. v2: Rebase on is_16bit_integer_constant -> is_uint16_constant rename. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Make is_16bit_constant from i965 an ir_constant method.Kenneth Graunke2014-04-081-16/+2
| | | | | | | | | | | | | | | | | | | | | | The i965 MUL instruction doesn't natively support 32-bit by 32-bit integer multiplication; additional instructions (MACH/MOV) are required. However, we can avoid those if we know one of the operands can be represented in 16 bits or less. The vector backend's is_16bit_constant static helper function checks for this. We want to be able to use it in the scalar backend as well, which means moving the function to a more generally-usable location. Since it isn't i965 specific, I decided to make it an ir_constant method, in case it ends up being useful to other people as well. v2: Rename from is_16bit_integer_constant to is_uint16_constant, as suggested by Ilia Mirkin. Update comments to clarify that it does apply to both int and uint types, as long as the value is non-negative and fits in 16-bits. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* mesa: Move is_power_of_two() function from brw_context.h to macros.h.Kenneth Graunke2014-04-081-6/+0
| | | | | | | | | This makes the function available from core Mesa code, including the GLSL compiler. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Fix "SIMD16 unsupported" messages via KHR_debug.Kenneth Graunke2014-04-081-1/+1
| | | | | | | | | | Performance warnings are logged via KHR_debug in addition to when the INTEL_DEBUG=perf environment variable is set. Without this, messages in debug contexts would have "(null)" for the reason. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Fix missing dirty bits in the gen8_sbe_state atom.Kenneth Graunke2014-04-071-2/+2
| | | | | | | | | | These are clearly needed---the comments in the function are even present for each one of them. I originally had two separate state atoms for 3DSTATE_SBE and 3DSTATE_SBE_SWIZ. When I combined the functions, I must have forgotten to add the atoms for 3DSTATE_SBE_SWIZ. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Drop BRW_NEW_RASTERIZER_DISCARD flag from Broadwell SOL atom.Kenneth Graunke2014-04-071-1/+0
| | | | | | | | Nothing actually uses this---we handle rasterizer discard in the clipper in order for statistics counters to work. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Use the correct program when uploading Broadwell SOL state.Kenneth Graunke2014-04-071-6/+2
| | | | | | | This is the equivalent of commit 43e77215b13b2f86e461cd8a62b542f. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Remove left-over 'removed' variable.Matt Turner2014-04-071-13/+8
| | | | | | | | I think this was used for coalescing out partly dead large virtual registers, but the patch that enabled that caused regressions and didn't make it upstream. Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Check for interference after finding all channels.Matt Turner2014-04-071-11/+26
| | | | | | | | | | | | It's more likely that we won't find writes to all channels than one will interfere, and calculating interference is more expensive. This change will also help prepare for coalescing load_payload instructions' operands. Also update the live intervals for all channels, and not just the last that we saw. Reviewed-by: Eric Anholt <[email protected]>
* i965: initialize more device info fields for CherryviewJordan Justen2014-04-071-0/+2
| | | | | | | | | The intent in 9b6b084eb7b10d006b44e3cd22585fc3e39e0c00 was for urb .size and .min_vs_entries fields to use the values from the GEN8_FEATURES macro. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: Allow constant propagation into dot product.Matt Turner2014-04-051-0/+4
| | | | | | | total instructions in shared programs: 1667088 -> 1667055 (-0.00%) instructions in affected programs: 3362 -> 3329 (-0.98%) Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Split out can_coalesce_vars() function.Matt Turner2014-04-051-44/+47
| | | | Reviewed-by: Anuj Phogat <[email protected]>
* i965/fs: Split out is_coalesce_candidate() function.Matt Turner2014-04-051-14/+23
| | | | Reviewed-by: Anuj Phogat <[email protected]>
* i965/fs: Split fs_visitor::register_coalesce() into its own file.Matt Turner2014-04-053-181/+209
| | | | | | | The function has gotten large, and brw_fs.cpp is the largest source file in the driver. Reviewed-by: Anuj Phogat <[email protected]>
* i965/fs: Mark appropriate fs_inst members as const.Matt Turner2014-04-052-15/+15
| | | | Reviewed-by: Anuj Phogat <[email protected]>
* i965: Mark is_tex() and friends as const.Matt Turner2014-04-052-10/+10
| | | | Reviewed-by: Anuj Phogat <[email protected]>
* i965/fs: Don't propagate saturation modifiers if there are source modifiers.Matt Turner2014-04-051-0/+2
| | | | | | | | | | | | | | | | | Which would lead to translating mad vgrf9:F, vgrf3:F, u0:F, vgrf6:F mov.sat vgrf7:F, -vgrf9:F into mad.sat vgrf9:F, vgrf3:F, u0:F, vgrf6:F mov vgrf7:F, -vgrf9:F Fixes some lighting effects in Dota2. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76749 Reviewed-by: Ian Romanick <[email protected]>
* i965/fs: Don't propagate saturate modifiers into partial writes.Matt Turner2014-04-051-1/+2
| | | | Reviewed-by: Ian Romanick <[email protected]>
* i965/fs: Fix off-by-one in saturate propagation.Matt Turner2014-04-051-1/+1
| | | | | | | | | | | | ip needs to be initialized to start_ip - 1, since the first thing in the main loop is ip++. Otherwise we would incorrectly propagate the saturate from the mov to the mad: mad a, b, c, d mov.sat x, a add y, z, a Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: Consider sources of non-GRF-dst instructions for dead channels.Matt Turner2014-04-051-12/+8
| | | | | | | | | | | | Previously we'd ignore the sources of instructions with non-GRF destinations when calculating calculating the dead channels. This would lead to us incorrectly removing the first instruction in this sequence: mov vgrf11, ... cmp.ne.f0 null, vgrf11, 1.0 mov vgrf11, ... Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76616
* i965/fs: Name temporary ralloc contexts something other than mem_ctx.Matt Turner2014-04-052-10/+10
| | | | | | | | | Or else poor programmers might mistakenly use the temporary mem_ctx, instead of the fs_visitor's mem_ctx and wonder why their code is crashing. Also remove the parenting. These contexts are local to the optimization passes they're in and are freed at the end.
* i965/fs: Recalculate live intervals in calculate_register_pressure().Matt Turner2014-04-051-0/+1
| | | | | | | | Otherwise calling dump_instructions() after declaring a new fs_reg would segfault when calculate_register_pressure()'s loop over reg walked off the end of the virtual_grf_start[] array that calculate_live_intervals() would have reallocated for you, if it had known there was a new register.
* i965: Mark SNB GT1 as a GT1.Matt Turner2014-04-041-1/+1
| | | | | | | brw->gt only seems to be used on gen >= 7, so this shouldn't have any effect. Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: do not trim dead channels on gen6 for mathTapani Pälli2014-04-021-4/+9
| | | | | | | | | | | | Do not set a writemask on Gen6 for math instructions, those are executed using align1 mode that does not support a destination mask. v2: cleanups, better comment (Matt) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76883 Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Don't trim writemasks of texture instructions.Matt Turner2014-03-311-2/+4
| | | | | | | | | It was my understanding that the writemask works in SIMD4x2 mode for texturing instructions and doesn't require a message header. Some bit of this logic must be wrong, so disable it until it's understood. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76617 Reviewed-by: Kenneth Graunke <[email protected]>
* dri/i965: use CLOCK_LIBS over -lrtEmil Velikov2014-03-311-1/+1
| | | | | Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Add Cherryview support.Kenneth Graunke2014-03-282-0/+16
| | | | | | | | Based on a patch by Ville Syrjälä. As usual, these are placeholder values; actual values will come later. Signed-off-by: Kenneth Graunke <[email protected]>
* i965: Make sure we always compute valid index bounds before drawing.Iago Toral Quiroga2014-03-281-1/+2
| | | | | | | | | When doing software rendering (i.e. rendering to the selection buffer) we need to make sure that we have valid index bounds before calling _tnl_draw_prims(), otherwise we can crash. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=59455 Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Use intel_upload_space() for pull constant uploads.Eric Anholt2014-03-264-33/+17
| | | | | | | | | | | | | This also happens to fix a leak of the current GS pull constant BO on context destroy, by just not holding on to the pull const bos after the surface state is generated. No statistically significant performance difference on GLB2.7 on HSW at 1024x768 (n=40) or 320x240 (n=44), or on BYT at 320x240 (n=47). v2: Rebase on intel_upload simplification. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Massively simplify the intel_upload implementation.Eric Anholt2014-03-264-126/+77
| | | | | | | | | | | | | | | | | | | The implementation kept a page-sized area for uploading data, and uploaded chunks from that to a 64kb-sized streamed buffer. This wasted cache footprint (and extra state tracking to do so) when we want to just write our data into the buffer immediately. Instead, build it around an interface like brw_state_batch() that just gets you a pointer to BO memory to upload your stuff immediately. Improves OpenArena on HSW by 1.62209% +/- 0.355299% (n=61) and on BYT by 1.7916% +/- 0.415743% (n=31). v2: Rebase on Mesa master, drop old prototypes. Re-do performance comparison on a kernel that doesn't punish CPU efficiency improvements. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: For fast color clears, only check the color of live channels.Kevin Rogovin2014-03-251-1/+2
| | | | | | | | When deciding if a clear color is suitable for fast clear, take into account if a color channel is active in the buffer format. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Set Broadwell MOCS values everywhere it's possible.Kenneth Graunke2014-03-256-12/+27
| | | | | | | | | | | | | | This patch introduces two pre-canned MOCS values: BDW_MOCS_WB (write-back, all caches) and BDW_MOCS_WT (write-through, all caches). We use write-through caching for render targets, and write-back for all other data. (At least on Haswell, I believe write-back LLC/eLLC didn't work for scan-out buffers, while write-through did.) No performance analysis has been done on the impact of this patch. Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Eric Anholt <[email protected]>
* i965: fix dma_buf import with non-zero offset.Gwenole Beauchesne2014-03-251-0/+9
| | | | | | | | | | | Fix eglCreateImage() from a packed dma_buf surface with a non-zero offset to pixels data. In particular, this fixes support for planar YUV surfaces when they are individually mapped on a per-plane basis, i.e. when the OES_EGL_image_external is not used and user application wants to use its own shader code for composition, or processing on individual plane (OCL). Signed-off-by: Gwenole Beauchesne <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* mesa/sso: rename Shader to the pointer _ShaderGregory Hainaut2014-03-2513-19/+19
| | | | | | | | | | | | | | | | Basically a sed but shaderapi.c and get.c. get.c => GL_CURRENT_PROGAM always refer to the "old" UseProgram behavior shaderapi.c => the old api stil update the Shader object directly V2: formatting improvement V3 (idr): * Rebase fixes after a block of code was moved from ir_to_mesa.cpp to shaderapi.c. * Trivial reformatting. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: For color clears, only disable writes to components that exist.Kenneth Graunke2014-03-241-1/+2
| | | | | | | | | | | | | | | | | | | | | | | The SIMD16 replicated FB write message only works if we don't need the color calculator to mask our framebuffer writes. Previously, we bailed on it if color_mask wasn't <true, true, true, true>. However, this was needlessly strict for formats with fewer than four components - only the components that actually exist matter. WebGL Aquarium attempts to clear a BGRX texture with the ColorMask set to <true, true, true, false>. This will work perfectly fine with the replicated data message; we just bailed unnecessarily. Improves performance of WebGL Aquarium on Iris Pro (at 1920x1080) by abound 50%, and Bay Trail (at 1366x768) by over 70% (using Chrome 24). v2: Use _mesa_format_has_color_component() to properly handle ALPHA formats (and generally be less fragile). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Tested-by: Dylan Baker <[email protected]>
* i965: Fix compiler warning about signed/unsigned.Eric Anholt2014-03-241-1/+1
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/gen8: Change the winsys MSAA blits from blorp to meta.Eric Anholt2014-03-244-8/+152
| | | | | | | | | | | | | | | | | | This gets us equivalent code paths on BDW and pre-BDW, except for stencil (where we don't have MSAA stencil resolve code yet) Improves MSAA-forced citybench by 7.94496% +/- 2.38429% (n=16). Reduces DRI2 MSAA glxgears performance by -12.3559% +/- 1.52845% (n=9). v2: Move the new meta code to brw_meta_updownsample.c, name it brw_meta_updownsample(), add a comment about intel_rb_storage_first_mt_slice(), and rename that function and move the RB generation into it (review ideas by Ken). v3: Fix 2 src vs dst pasteos in previous change. v4: Skip this path pre-gen8 for now, until we can analyze the glxgears performance delta some more. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Skip reallocating the private MSAA miptree, unless it's resized.Eric Anholt2014-03-241-17/+28
| | | | | | | | | | | Even if the singlesample_mt got reopened from DRI due to pageflipping/buffer swapping, our private miptree shouldn't need any changes. Improves performance of a little swapbuffers-loving microbenchmark with MSAA forced on, by 1.2371% +/- 0.624802% (n=102) Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Simplify the no-reopening-the-winsys-buffer tests.Eric Anholt2014-03-241-22/+16
| | | | | | | The formatting was weird, and the tests were duplicated, and it is guaranteed that mt->region exists. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Don't forget to free the old singlesample_mt.Eric Anholt2014-03-241-0/+1
| | | | | | | Fixes a memory leak with MSAA winsys buffers since my move of singlesample_mt to the rb in 4e0924c5de5f3964e4ca81f923d877dbb59fad0a Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add an env var for forcing window system MSAA.Eric Anholt2014-03-242-0/+17
| | | | | | | | Sometimes it would be nice to benchmark some app with MSAA versus not, but it doesn't offer the controls you want. Just provide a handy knob to force the issue. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: Eliminate dead writes to the flag register.Matt Turner2014-03-241-18/+48
| | | | | | | | | | | For each write, search previous instructions for unread writes to the flag register and remove them. Note that this will not eliminate the last unread write. total instructions in shared programs: 788074 -> 788004 (-0.01%) instructions in affected programs: 4930 -> 4860 (-1.42%) Reviewed-by: Eric Anholt <[email protected]>
* i965/vec4: Eliminate writes that are never read.Matt Turner2014-03-241-0/+46
| | | | | | | | | | With an awful O(n^2) algorithm that searches previous instructions for dead writes. total instructions in shared programs: 805582 -> 788074 (-2.17%) instructions in affected programs: 144561 -> 127053 (-12.11%) Reviewed-by: Eric Anholt <[email protected]>
* i965/vec4: Factor code out of DCE into a separate function.Matt Turner2014-03-241-34/+39
| | | | | | Will be reused in the next commit. Reviewed-by: Eric Anholt <[email protected]>
* i965/vec4: Let dead code eliminate trim dead channels.Matt Turner2014-03-241-3/+26
| | | | | | | | | | | | | | | | | That is, modify mad dst, a, b, c to be mad dst.xyz, a, b, c if dst.w is never read. total instructions in shared programs: 811869 -> 805582 (-0.77%) instructions in affected programs: 168287 -> 162000 (-3.74%) Reviewed-by: Eric Anholt <[email protected]>
* i965/vec4: Track live ranges per-channel, not per vgrf.Matt Turner2014-03-242-14/+41
| | | | | | Will be squashed with the next patch. Reviewed-by: Eric Anholt <[email protected]>
* i965/vec4: Don't dead code eliminate instructions writing the flag.Matt Turner2014-03-241-1/+5
| | | | | | | | | | A future patch adds support for removing dead writes to the flag register. This patch simplifies the logic until then. total instructions in shared programs: 811813 -> 811869 (0.01%) instructions in affected programs: 3378 -> 3434 (1.66%) Reviewed-by: Eric Anholt <[email protected]>