summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* vbo: s/GLuint/GLbitfield/ for state bitmasksBrian Paul2016-01-064-4/+4
| | | | Reviewed-by: José Fonseca <[email protected]>
* st/mesa: use GLbitfield in st_state_flags, add commentsBrian Paul2016-01-061-2/+2
| | | | | | Use GLbitfield instead of GLuint to be consistent with other variables. Reviewed-by: José Fonseca <[email protected]>
* s/GLuint/GLbitfield/ for st_invalidate_state() parameterBrian Paul2016-01-062-2/+2
| | | | | | To match dd_function_table::UpdateState(). Reviewed-by: José Fonseca <[email protected]>
* st/mesa: be more careful about state validation in st_Bitmap()Brian Paul2016-01-061-1/+8
| | | | | | | | | | | If the only dirty state is mesa's _NEW_PROGRAM_CONSTANTS flag, we can skip state validation before drawing a bitmap since that state doesn't effect bitmap rendering. This further increases the performance of the ipers demo on llvmpipe to about what it was before commit 36c93a6fae27561. Reviewed-by: José Fonseca <[email protected]>
* st/mesa: move bitmap cache flushing out of state validationBrian Paul2016-01-066-4/+17
| | | | | | Just do it where needed (before drawing, clearing, etc). Reviewed-by: José Fonseca <[email protected]>
* st/mesa: check state->mesa in early return check in st_validate_state()Brian Paul2016-01-061-1/+1
| | | | | | | | | | | | | | | We were checking the dirty->st flags but not the dirty->mesa flags. When we took the early return, we didn't clear the dirty->mesa flags so the next time we called st_validate_state() we'd often flush the glBitmap cache. And since st_validate_state() is called from st_Bitmap(), it meant we flushed the bitmap cache for every glBitmap() call. This change seems to recover most of the performance loss observed with the ipers demo on llvmpipe since commit commit 36c93a6fae27561. Cc: [email protected] Reviewed-by: José Fonseca <[email protected]>
* st/mesa: protect debug printf() with a conditional instead of commentBrian Paul2016-01-061-5/+5
|
* st/mesa: fix comment indentation in st_flush_bitmap_cache()Brian Paul2016-01-061-2/+2
|
* glsl: fix varying slot allocation for blocks and structs with explicit locationsTimothy Arceri2016-01-071-4/+5
| | | | | | | | | Previously each member was being counted as using a single slot, count_attribute_slots() fixes the count for array and struct members. Also don't assign a negitive to the unsigned expl_location variable. Reviewed-by: Ilia Mirkin <[email protected]>
* glsl: don't try adding built-ins to explicit locations bitmaskTimothy Arceri2016-01-071-1/+3
| | | | | Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* glsl: fix overlapping of varying locations for arrays and structsTimothy Arceri2016-01-071-12/+67
| | | | | | | | | | | | | | | | | | | | | Previously we were only reserving a single location for arrays and structs. We also didn't take into account implicit locations clashing with explicit locations when assigning locations for their arrays or structs. This patch fixes both issues. V5: fix regression for patch inputs/outputs in tessellation shaders V4: just use count_attribute_slots() to get the number of slots, also calculate the correct number of slots to reserve for gs and tess stages by making use of the new get_varying_type() helper. V3: handle arrays of structs V2: also fix for arrays of arrays and structs. Acked-by: Anuj Phogat <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* glsl: create helper to remove outer vertex index array used by some stagesTimothy Arceri2016-01-071-10/+26
| | | | | | | | This will be used in the following patch for calculating array sizes correctly when reserving explicit varying locations. Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* glsl: remove unused varyings before packing themTimothy Arceri2016-01-073-48/+54
| | | | | | | | | | | | | | | Previously we would pack varyings before trying to remove them, this relied on the packing pass not packing varyings with a location of -1 to avoid packing varyings that should be removed. However this meant unused varyings with an explicit location would be packed before they could be removed when we enable packing of them in a later patch. V2: fix regression in V1 removing unused varyings in multi-stage SSO, fix regression with single stage programs. Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* gallium/r600: Replace ALIGN_DIVUP with DIV_ROUND_UPKrzysztof Sobiecki2016-01-063-3/+2
| | | | | | | | ALIGN_DIVUP is a driver specific(r600g) macro that duplicates DIV_ROUND_UP functionality. Replacing it with DIV_ROUND_UP eliminates this problems. Signed-off-by: Krzysztof A. Sobiecki <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* vc4: Fix driver build from last minute rebase fix.Eric Anholt2016-01-061-21/+20
| | | | | | | I had the driver all tested for the last series, and in my last build I noticed that get_swizzled_channel was unused now, and removed it... apparently without testing to find that I removed the wrong channel swizzle function.
* vc4: Optimize out a comparison for bcsel based on an ALU comparisonEric Anholt2016-01-061-14/+59
| | | | | | | | | | | | | | | | | We routinely have code like: vec1 ssa_220 = fge ssa_104, ssa_61 vec1 ssa_199 = bcsel ssa_220, ssa_106, ssa_105 and we would compare fge's args and choose between ~0 and 0 to generate ssa_220, then compare ssa_220 to 0 and choose between bcsel's args. Instead, try to notice the pattern and compare between fge's args to select between bcsel's args. total instructions in shared programs: 88019 -> 87574 (-0.51%) instructions in affected programs: 9985 -> 9540 (-4.46%) total estimated cycles in shared programs: 245752 -> 245237 (-0.21%) estimated cycles in affected programs: 17232 -> 16717 (-2.99%)
* vc4: Add missing sRGB decode to texel fetches.Eric Anholt2016-01-061-0/+5
| | | | | We only see txf on MSAA textures, currently, and apparently this didn't impact any of our piglit tests.
* vc4: Add support for GL_ARB_texture_swizzle.Eric Anholt2016-01-061-1/+1
| | | | | We already had the code supporting it, since it's needed for the depth mode when doing shadow comparisons.
* vc4: Use NIR texture lowering for texture swizzling.Eric Anholt2016-01-062-57/+63
| | | | | | | | | | | | | | We can't use its other features currently (mostly because we don't want Newton-Raphson on rcps for texture coordinates), but it gets us started. This eliminates some comparisons with constants in GLB2.7 and ETQW traces at the QIR level by moving the comparisons into NIR, where they get constant-folded out. instructions in affected programs: 165 -> 156 (-5.45%) total uniforms in shared programs: 32087 -> 32085 (-0.01%) total estimated cycles in shared programs: 245762 -> 245752 (-0.00%) estimated cycles in affected programs: 461 -> 451 (-2.17%)
* vc4: Replace the SSA-style SEL operators with conditional MOVs.Eric Anholt2016-01-066-201/+128
| | | | | | | | | | | | | I'm moving away from QIR being SSA (since NIR is doing lots of SSA optimization for us now) and instead having QIR just be QPU operations with virtual registers. By making our SELs be composed of two MOVs, we could potentially coalesce the registers for the MOV's src and dst and eliminate the MOV. total instructions in shared programs: 88448 -> 88028 (-0.47%) instructions in affected programs: 39845 -> 39425 (-1.05%) total estimated cycles in shared programs: 246306 -> 245762 (-0.22%) estimated cycles in affected programs: 162887 -> 162343 (-0.33%)
* vc4: Don't try the SF coalescing unless it's on a def.Eric Anholt2016-01-061-3/+3
| | | | | | If you want the SF of the value of a register produced from a series of packing MOVs or conditional MOVs, we can't just SF on the last MOV into the register.
* gallium/drivers/svga: Use unsigned for loop indexEdward O'Callaghan2016-01-062-7/+7
| | | | | | | | Fix a 's/unsigned int/unsigned/' consistency case while here. Found-by: Coccinelle Signed-off-by: Edward O'Callaghan <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* gallium/drivers/r600: Use unsigned for loop indexEdward O'Callaghan2016-01-061-9/+9
| | | | | | Found-by: Coccinelle Signed-off-by: Edward O'Callaghan <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* gallium/drivers/ilo: Use unsigned for loop indexEdward O'Callaghan2016-01-064-16/+16
| | | | | | Found-by: Coccinelle Signed-off-by: Edward O'Callaghan <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* gallium: Use unsigned for loop indexEdward O'Callaghan2016-01-061-3/+3
| | | | | | Found-by: Coccinelle Signed-off-by: Edward O'Callaghan <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* gallium/drivers: Remove unnecessary semicolonsEdward O'Callaghan2016-01-0610-11/+11
| | | | | | Found-by: Coccinelle Signed-off-by: Edward O'Callaghan <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* gallium: Remove unnecessary semicolonsEdward O'Callaghan2016-01-068-8/+9
| | | | | | | | | Fix silly issue with MSVC case fall-though support to need a extra 'break;' Found-by: Coccinelle Signed-off-by: Edward O'Callaghan <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* llvmpipe: Optimize lp_rast_triangle_32_3_16 for POWER8Oded Gabbay2016-01-061-1/+141
| | | | | | | | | | | | | | | | | | | | | This patch converts the SSE-optimized lp_rast_triangle_32_3_16() to VMX/VSX. I measured the results on POWER8 machine with 32 cores at 3.4GHz and 16GB of RAM. FPS/Score Name Before After Delta ------------------------------------------------ openarena 16.35 16.7 2.14% xonotic 4.707 4.97 5.57% glmark2 didn't show a significant (more than 1%) difference. v2: Make sure code is build only on POWER8 LE machine Signed-off-by: Oded Gabbay <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: Optimize BUILD_MASK(_LINEAR) for POWER8Oded Gabbay2016-01-061-40/+110
| | | | | | | | | | | | | | | | | | | | | This patch converts the SSE-optimized build_mask_32() and build_mask_linear_32() to VMX/VSX. I measured the results on POWER8 machine with 32 cores at 3.4GHz and 16GB of RAM. FPS/Score Name Before After Delta ------------------------------------------------ glmark2 (score) 139.8 142.7 2.07% openarena and xonotic didn't show a significant (more than 1%) difference. v2: Make sure code is build only on POWER8 LE machine Signed-off-by: Oded Gabbay <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: Optimize do_triangle_ccw for POWER8Oded Gabbay2016-01-061-0/+100
| | | | | | | | | | | | | | | | | | | | | | | This patch converts the SSE optimization done in do_triangle_ccw to VMX/VSX. I measured the results on POWER8 machine with 32 cores at 3.4GHz and 16GB of RAM. FPS/Score Name Before After Delta ------------------------------------------------ glmark2 (score) 136.6 139.8 2.34% openarena 16.14 16.35 1.30% xonotic 4.655 4.707 1.11% v2: - Convert loads to use aligned loads - Make sure code is build only on POWER8 LE machine Signed-off-by: Oded Gabbay <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: add POWER8 portability file - u_pwr8.hOded Gabbay2016-01-061-0/+310
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This file provides a portability layer that will make it easier to convert SSE-based functions to VMX/VSX-based functions. All the functions implemented in this file are prefixed using "vec_". Therefore, when converting from SSE-based function, one needs to simply replace the "_mm_" prefix of the SSE function being called to "vec_". Having said that, not all functions could be converted as such, due to the differences between the architectures. So, when doing such conversion hurt the performance, I preferred to implement a more ad-hoc solution. For example, converting the _mm_shuffle_epi32 needed to be done using ad-hoc masks instead of a generic function. All the functions in this file support both little-endian and big-endian but currently the file is build only on POWER8 LE machine. All of the functions are implemented using the Altivec/VMX intrinsics, except one where I needed to use inline assembly (due to missing intrinsic). v2: - Use vec_vgbbd instead of __builtin_vec_vgbbd - Add an aligned load function - Don't use typeof() - Make file build only on POWER8 LE machine Signed-off-by: Oded Gabbay <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* configure.ac: Detect if running on POWER8 archOded Gabbay2016-01-061-0/+55
| | | | | | | | | | | | | | | | | To determine if we could use special POWER8 assembly directives, we first need to detect whether we are running on POWER8 architecture. This patch adds this detection to configure.ac and adds the necessary compilation flags accordingly. v2: - Add option to disable POWER8 instructions generation - Detect whether building on BE or LE machine and build with -mpower8-vector only on LE machine - Make the printed messages more standard Signed-off-by: Oded Gabbay <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* nir: Add a lower_fdiv option, turn fdiv into fmul/frcp.Kenneth Graunke2016-01-053-0/+3
| | | | | | | | | | | | | | | | The nir_opt_algebraic rule (('fadd', ('flog2', a), ('fneg', ('flog2', b))), ('flog2', ('fdiv', a, b))), can produce new fdiv operations, which need to be lowered on i965, as we don't actually implement fdiv. (Normally, we handle this in GLSL IR's lower_instructions pass, but in the above case we introduce an fdiv after that point. So, make NIR do it for us.) Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: [email protected]
* i965: Only turn on ARB_compute_shader if we can write registers.Kenneth Graunke2016-01-051-2/+3
| | | | | | | | | | | | | | Compute shaders require reconfiguring the L3 for shared local memory support. We have to be able to write the L3 registers to do that. This effectively turns off compute shaders prior to Kernel 4.2. (Previously, the extension enable was in an API_OPENGL_CORE conditional. However, that isn't necessary - core Mesa extension handling already restricts it properly. I've moved it out in this patch.) Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965: Use rcp in brw_lower_texture_gradients rather than 1.0 / x.Kenneth Graunke2016-01-051-1/+1
| | | | | | | | That's what it's for. Plus, we actually implement rcp. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* mesa: fix GL_MAX_NAME_LENGTH query for tessellation shadersTimothy Arceri2016-01-061-2/+6
| | | | | | | | | This fixes some piglit subtests for ARB_program_interface_query. V3: remove some of the unnecessary parentheses V2: fix alignment Reviewed-by: Marek Olšák <[email protected]>
* glsl: don't change the varying type in validation codeTimothy Arceri2016-01-061-5/+0
| | | | | | | | There is a function dedicated to demoting unused varyings lets trust it to do its job. Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* glsl: move lowering after matching validationTimothy Arceri2016-01-061-11/+11
| | | | | | | | After lowering the matching flag is_unmatched_generic_inout is lost so we need to move this validation before lowering. Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* glsl: only add outward facing varyings to resourse list for SSOTimothy Arceri2016-01-061-7/+10
| | | | | | | | | An SSO program can have multiple stages and we only want to add the externally facing varyings. The current code was adding both the packed inputs and outputs for the first and last stage of each program. Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* i965/gen9: Modify the conditions to use blitter on skl+Anuj Phogat2016-01-051-3/+9
| | | | | | | | | Conditions modified allow skl+ to use blitter: - for all tiling formats - to write data to YF/YS tiled surfaces Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965/gen9: Return false in place of assert in intelEmitCopyBlit()Anuj Phogat2016-01-051-3/+4
| | | | | | | This allows the fallback paths to handle it correctly. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/gen9: Remove regions overlap check in fast copy blitAnuj Phogat2016-01-051-5/+0
| | | | | | | | Overlapping blits are anyway undefined in OpenGL. So no need of overlap check here. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/gen9: Don't use fast copy blit in case of non power of 2 cppAnuj Phogat2016-01-051-2/+4
| | | | | | | Fast copy blit is currently enabled for use only with Yf/Ys tiling. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i915/i965: Fix typo in perf_debug messageIan Romanick2016-01-052-2/+2
| | | | | | Trivial Signed-off-by: Ian Romanick <[email protected]>
* st/mesa: minor indentation fixesBrian Paul2016-01-051-4/+4
|
* draw: minor indentation fixBrian Paul2016-01-051-1/+1
|
* mesa: minor clean-up of some memcpy/sizeof() calls in m_matrix.cBrian Paul2016-01-051-7/+7
| | | | Reviewed-by: Charmaine Lee <[email protected]>
* util: add debug_dump_ubyte_rgba_bmp()Brian Paul2016-01-052-0/+63
| | | | | | Like debug_dump_float_rgba_bmp() but takes ubyte values. Reviewed-by: Charmaine Lee <[email protected]>
* mesa: check for z=0 in _mesa_Vertex3dv()Brian Paul2016-01-051-1/+4
| | | | | | | | | | | It's very rare that a GL app calls glVertex3dv(), but one in particular calls it lot, always with Z = 0. Check for that condition and convert the call into glVertex2f. This reduces VBO memory used and reduces the number of times we have to switch between float[2] and float[3] vertex formats in the svga driver. This results in a small but measurable performance improvement. Reviewed-by: Charmaine Lee <[email protected]>
* svga: fix test for SVGA_NEW_STIPPLEBrian Paul2016-01-052-4/+8
| | | | | | | | | | | | | We only want to set the SVGA_NEW_STIPPLE dirty flag when the polygon stipple state changes. Before, we only set the flag when we were enabling stipple, but not disabling. We don't really have to add SVGA_NEW_STIPPLE to the dirty FS state set since it's a subset of SVGA_NEW_RAST, but let's be explicit. This doesn't fix any known bugs. Reviewed-by: Charmaine Lee <[email protected]>