summaryrefslogtreecommitdiffstats
path: root/src/mesa
Commit message (Collapse)AuthorAgeFilesLines
* mesa: Add KBL PCI IDs and platform information.Sarah Sharp2016-01-061-0/+60
| | | | | | | | | | | | | | | | | | | | | | | | | Add PCI IDs for the Intel Kabylake platforms. The IDs are taken directly from the Linux kernel patches, which are under review: http://lists.freedesktop.org/archives/intel-gfx/2015-October/078967.html http://cgit.freedesktop.org/~vivijim/drm-intel/log/?h=kbl-upstream-v2 The Kabylake PCI IDs taken from the kernel are rearranged to be in order of GT type, then PCI ID. Please note that if this patch is backported, the following fixes will need to be added before this patch: commit 28ed1e08e8ba98e "i965/skl: Remove early platform support" commit c1e38ad37042b0e "i965/skl: Use larger URB size where available." Thanks to Ben for fixing a bug around setting urb.size, and being patient with my questions about what the various fields mean. Signed-off-by: Sarah Sharp <[email protected]> Suggested-by: Ben Widawsky <[email protected]> Tested-by: Rodrigo Vivi <[email protected]> (KBL-GT2) Cc: "11.1" <[email protected]>
* st/mesa: minor clean-ups in st_atom.cBrian Paul2016-01-061-14/+10
| | | | Remove useless comment. Reformat code.
* st/mesa: replace bitmap size checks with assertionBrian Paul2016-01-061-2/+2
| | | | The _mesa_Bitmap() caller already checks for zero-sized bitmaps.
* st/mesa: check texture target in allocate_full_mipmap()Brian Paul2016-01-061-0/+14
| | | | | | | Some kinds of textures never have mipmaps. 3D textures seldom have mipmaps. Reviewed-by: José Fonseca <[email protected]>
* st/mesa: move mipmap allocation check logic into a functionBrian Paul2016-01-061-12/+42
| | | | | | Better readability and easier to extend. Reviewed-by: José Fonseca <[email protected]>
* main: s/GLuint/GLbitfield for state bitmasksBrian Paul2016-01-062-3/+3
| | | | Reviewed-by: José Fonseca <[email protected]>
* vbo: s/GLuint/GLbitfield/ for state bitmasksBrian Paul2016-01-064-4/+4
| | | | Reviewed-by: José Fonseca <[email protected]>
* st/mesa: use GLbitfield in st_state_flags, add commentsBrian Paul2016-01-061-2/+2
| | | | | | Use GLbitfield instead of GLuint to be consistent with other variables. Reviewed-by: José Fonseca <[email protected]>
* s/GLuint/GLbitfield/ for st_invalidate_state() parameterBrian Paul2016-01-062-2/+2
| | | | | | To match dd_function_table::UpdateState(). Reviewed-by: José Fonseca <[email protected]>
* st/mesa: be more careful about state validation in st_Bitmap()Brian Paul2016-01-061-1/+8
| | | | | | | | | | | If the only dirty state is mesa's _NEW_PROGRAM_CONSTANTS flag, we can skip state validation before drawing a bitmap since that state doesn't effect bitmap rendering. This further increases the performance of the ipers demo on llvmpipe to about what it was before commit 36c93a6fae27561. Reviewed-by: José Fonseca <[email protected]>
* st/mesa: move bitmap cache flushing out of state validationBrian Paul2016-01-066-4/+17
| | | | | | Just do it where needed (before drawing, clearing, etc). Reviewed-by: José Fonseca <[email protected]>
* st/mesa: check state->mesa in early return check in st_validate_state()Brian Paul2016-01-061-1/+1
| | | | | | | | | | | | | | | We were checking the dirty->st flags but not the dirty->mesa flags. When we took the early return, we didn't clear the dirty->mesa flags so the next time we called st_validate_state() we'd often flush the glBitmap cache. And since st_validate_state() is called from st_Bitmap(), it meant we flushed the bitmap cache for every glBitmap() call. This change seems to recover most of the performance loss observed with the ipers demo on llvmpipe since commit commit 36c93a6fae27561. Cc: [email protected] Reviewed-by: José Fonseca <[email protected]>
* st/mesa: protect debug printf() with a conditional instead of commentBrian Paul2016-01-061-5/+5
|
* st/mesa: fix comment indentation in st_flush_bitmap_cache()Brian Paul2016-01-061-2/+2
|
* nir: Add a lower_fdiv option, turn fdiv into fmul/frcp.Kenneth Graunke2016-01-051-0/+1
| | | | | | | | | | | | | | | | The nir_opt_algebraic rule (('fadd', ('flog2', a), ('fneg', ('flog2', b))), ('flog2', ('fdiv', a, b))), can produce new fdiv operations, which need to be lowered on i965, as we don't actually implement fdiv. (Normally, we handle this in GLSL IR's lower_instructions pass, but in the above case we introduce an fdiv after that point. So, make NIR do it for us.) Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: [email protected]
* i965: Only turn on ARB_compute_shader if we can write registers.Kenneth Graunke2016-01-051-2/+3
| | | | | | | | | | | | | | Compute shaders require reconfiguring the L3 for shared local memory support. We have to be able to write the L3 registers to do that. This effectively turns off compute shaders prior to Kernel 4.2. (Previously, the extension enable was in an API_OPENGL_CORE conditional. However, that isn't necessary - core Mesa extension handling already restricts it properly. I've moved it out in this patch.) Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965: Use rcp in brw_lower_texture_gradients rather than 1.0 / x.Kenneth Graunke2016-01-051-1/+1
| | | | | | | | That's what it's for. Plus, we actually implement rcp. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* mesa: fix GL_MAX_NAME_LENGTH query for tessellation shadersTimothy Arceri2016-01-061-2/+6
| | | | | | | | | This fixes some piglit subtests for ARB_program_interface_query. V3: remove some of the unnecessary parentheses V2: fix alignment Reviewed-by: Marek Olšák <[email protected]>
* i965/gen9: Modify the conditions to use blitter on skl+Anuj Phogat2016-01-051-3/+9
| | | | | | | | | Conditions modified allow skl+ to use blitter: - for all tiling formats - to write data to YF/YS tiled surfaces Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965/gen9: Return false in place of assert in intelEmitCopyBlit()Anuj Phogat2016-01-051-3/+4
| | | | | | | This allows the fallback paths to handle it correctly. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/gen9: Remove regions overlap check in fast copy blitAnuj Phogat2016-01-051-5/+0
| | | | | | | | Overlapping blits are anyway undefined in OpenGL. So no need of overlap check here. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/gen9: Don't use fast copy blit in case of non power of 2 cppAnuj Phogat2016-01-051-2/+4
| | | | | | | Fast copy blit is currently enabled for use only with Yf/Ys tiling. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i915/i965: Fix typo in perf_debug messageIan Romanick2016-01-052-2/+2
| | | | | | Trivial Signed-off-by: Ian Romanick <[email protected]>
* st/mesa: minor indentation fixesBrian Paul2016-01-051-4/+4
|
* mesa: minor clean-up of some memcpy/sizeof() calls in m_matrix.cBrian Paul2016-01-051-7/+7
| | | | Reviewed-by: Charmaine Lee <[email protected]>
* mesa: check for z=0 in _mesa_Vertex3dv()Brian Paul2016-01-051-1/+4
| | | | | | | | | | | It's very rare that a GL app calls glVertex3dv(), but one in particular calls it lot, always with Z = 0. Check for that condition and convert the call into glVertex2f. This reduces VBO memory used and reduces the number of times we have to switch between float[2] and float[3] vertex formats in the svga driver. This results in a small but measurable performance improvement. Reviewed-by: Charmaine Lee <[email protected]>
* i965: quieten compiler warning about out-of-bounds accessIlia Mirkin2016-01-051-0/+1
| | | | | | | | | | | | | | | gcc 4.9.3 shows the following error: brw_vue_map.c:260:20: warning: array subscript is above array bounds [-Warray-bounds] return brw_names[slot - VARYING_SLOT_MAX]; This is because BRW_VARYING_SLOT_COUNT is a valid value for the enum type. Adding an assert will generate no additional code but will teach the compiler to not complain. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* i965/wm: use binding size for ubo/ssbo when automatic size is unsetIlia Mirkin2016-01-051-4/+10
| | | | | | | | | | | | | | This fixes the same tests that commit 8cf2e892f was attempting to fix: ES31-CTS.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-std430-vec-bindrangeOffset ES31-CTS.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-std430-vec-bindrangeSize as confirmed by Samuel. Signed-off-by: Ilia Mirkin <[email protected]> Cc: Samuel Iglesias Gonsálvez <[email protected]> Cc: Marta Lofstedt <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* Revert "i965/wm: use proper API buffer size for the surfaces."Ilia Mirkin2016-01-054-13/+5
| | | | | | | | | | | This reverts commit 8cf2e892fca20c4776b4a07c39918343cb2d4e0e. It's entirely bogus to attempt to store anything about the binding in the buffer object itself, which might be bound any number of times. Signed-off-by: Ilia Mirkin <[email protected]> Cc: Samuel Iglesias Gonsálvez <[email protected]> Cc: Marta Lofstedt <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* st/mesa: make KHR_debug output independent of context creation flags (v2)Nicolai Hähnle2016-01-044-57/+98
| | | | | | | | | | | | | Instead, keep track of GL_DEBUG_OUTPUT and (un)install the pipe_debug_callback accordingly. Hardware drivers can still use the absence of the callback to skip more expensive operations in the normal case, and users can no longer be surprised by the need to set the debug flag at context creation time. v2: - re-add the proper initialization of debug contexts (Ilia Mirkin) - silence a potential warning (Ilia Mirkin) Reviewed-by: Ilia Mirkin <[email protected]>
* i965/wm: use proper API buffer size for the surfaces.Samuel Iglesias Gonsálvez2016-01-044-5/+13
| | | | | | | | | | | | | | | Commit 5bb5eeea fixes a bug indicating that the surfaces should have the API buffer size. Hovewer it picked the wrong value. This patch adds a new variable, which takes into account glBindBufferRange() values. This patch fixes the following CTS regressions: ES31-CTS.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-std430-vec-bindrangeOffset ES31-CTS.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-std430-vec-bindrangeSize Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Marta Lofstedt <[email protected]>
* st/mesa: use PK2H/UP2H when supportedIlia Mirkin2016-01-033-5/+14
| | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* st/mesa: fix parameter names for tesseval/tessctrl prototypesSamuel Pitoiset2016-01-031-4/+4
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nouveau: fix double-const qualifierIlia Mirkin2016-01-032-2/+2
| | | | | | | Reported by Tom^ on IRC. The original intent was to mark the pointer constant as well as the data being pointed to, so move the *. Signed-off-by: Ilia Mirkin <[email protected]>
* nir: extract out helper macros for running passesRob Clark2016-01-031-36/+9
| | | | | | | | | Note these are a bit uglier, due to avoidance of GNU C extensions. But drivers which do not need to be built with compilers that don't support the extension can wrap these macros with their own. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Make TCS precompile use the TES primitive mode when available.Kenneth Graunke2016-01-021-1/+3
| | | | | | | | If there's a linked TES program, we should just use the actual primitive mode. If not, just guess triangles (as we did before). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Push most TES inputs in SIMD8 mode.Kenneth Graunke2016-01-021-12/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using the push model for inputs is much more efficient than pulling inputs - the hardware can simply copy a large chunk into URB registers at thread creation time, rather than having the thread send messages to request data from the L3 cache. Unfortunately, it's possible to have more TES inputs than fit in registers, so we have to fall back to the pull model in some cases. However, it turns out that most tessellation evaluation shaders are fairly simple, and don't use many inputs. An arbitrary cut-off of 32 vec4 slots (16 registers) is more than sufficient to ensure that 100% of TES inputs are pushed for Shadow of Mordor, Unigine Heaven, GPUTest/TessMark, and SynMark. Note that unlike most SIMD8 stages, this actually reads packed vec4 data, since that is what our vec4 TCS programs write. Improves performance in GPUTest's tessmark_x64 microbenchmark by 93.4426% +/- 5.35541% (n = 25) on my Lenovo X250 at 1024x768. Improves performance in Synmark's Gl40TerrainFlyTess microbenchmark by 22.74% +/- 0.309394% (n = 5). Improves performance in Shadow of Mordor at low settings with tessellation enabled at 1280x720 by 2.12197% +/- 0.478553% (n = 4). shader-db statistics for files containing tessellation shaders: total instructions in shared programs: 184358 -> 181181 (-1.72%) instructions in affected programs: 27971 -> 24794 (-11.36%) helped: 226 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Use LOAD_PAYLOAD for SIMD8 TES input loads, not MOV.Kenneth Graunke2016-01-021-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | We need a MOV to replicate g0.0<0,1,0> to all 8 channels. Since the message payload is a single register, MOV seemed more sensible than LOAD_PAYLOAD. However, MOV cannot be CSE'd, while LOAD_PAYLOAD can. All input loads can use the same header - we don't need to re-expand g0 every time. CSE accomplishes this, saving instructions. shader-db statistics for files containing tessellation shaders: total instructions in shared programs: 186923 -> 184358 (-1.37%) instructions in affected programs: 30536 -> 27971 (-8.40%) helped: 226 HURT: 0 total cycles in shared programs: 1009850 -> 1005356 (-0.45%) cycles in affected programs: 168206 -> 163712 (-2.67%) helped: 226 HURT: 0 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Move 3-src subnr swizzle handling into the vec4 backend.Kenneth Graunke2016-01-022-6/+18
| | | | | | | | | | | | | | | | | | | | | | | While most align16 instructions only support a SubRegNum of 0 or 4 (using swizzling to control the other channels), 3-src instructions actually support arbitrary SubRegNums. When the RepCtrl bit is set, we believe it ignores the swizzle and uses the equivalent of a <0,1,0> region from the subnr. In the past, we adopted a vec4-centric approach of specifying subnr of 0 or 4 and a swizzle, then having brw_eu_emit.c convert that to a proper SubRegNum. This isn't a great fit for the scalar backend, where we don't set swizzles at all, and happily set subnrs in the range [0, 7]. This patch changes brw_eu_emit.c to use subnr and swizzle directly, relying on the higher levels to set them sensibly. This should fix problems where scalar sources get copy propagated into 3-src instructions in the FS backend. I've only observed this with TES push model inputs, but I suppose it could happen in other cases. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* u_upload_mgr: allow specifying PIPE_USAGE_* for the upload bufferMarek Olšák2016-01-021-3/+6
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* u_upload_mgr: remove alignment parameter from u_upload_createMarek Olšák2016-01-021-8/+4
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* u_upload_mgr: pass alignment to u_upload_data manuallyMarek Olšák2016-01-023-3/+5
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* u_upload_mgr: pass alignment to u_upload_alloc manuallyMarek Olšák2016-01-024-4/+4
| | | | | | | | | | The fixed alignment of u_upload_mgr will go away. This is the first step. The motivation is that one u_upload_mgr can have multiple users, each allocating from the same buffer, but requiring a different alignment. Reviewed-by: Nicolai Hähnle <[email protected]>
* st/mesa: fix GLSL uniform updates for glBitmap & glDrawPixels (v2)Marek Olšák2016-01-025-19/+25
| | | | | | | | | | Spotted by luck. The GLSL uniform storage is only associated once in LinkShader and can't be reallocated afterwards, because that would break the association. v2: don't remove st_upload_constants calls, clarify why they're needed Cc: 11.0 11.1 <[email protected]>
* program: add _mesa_reserve_parameter_storageMarek Olšák2016-01-022-15/+36
| | | | | | | The next commit will use this. Reviewed-by: Brian Paul <[email protected]> Cc: 11.0 11.1 <[email protected]>
* mesa: Fix warning with MESA_VERBOSE=api for BindBufferRangeJordan Justen2016-01-011-1/+1
| | | | | Reported-by: Dieter Nützel <[email protected]> Signed-off-by: Jordan Justen <[email protected]>
* st/mesa: sort extensions enablement arrayIlia Mirkin2016-01-011-11/+11
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* mesa: Add MESA_VERBOSE=api for GL_ARB_program_interface_queryJordan Justen2016-01-011-0/+39
| | | | | | | | | v2: * Add braces '{}' when the _mesa_debug call spans multiple lines (Ken) Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Add MESA_VERBOSE=api for several indexed BindBuffer variantsJordan Justen2016-01-011-2/+25
| | | | | | | | v2: * Add braces '{}' when the _mesa_debug call spans multiple lines (Ken) Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* st/glsl_to_tgsi: fix block movs for doublesDave Airlie2016-01-011-1/+14
| | | | | | | | | | | | While playing with fp64, I disable varying packing to debug something else, and noticed we never emitted half the output movs for double matrix arrays. We should be moving the left index two slots for dual source doubles, and the right index two slots for non-vs input doubles. Signed-off-by: Dave Airlie <[email protected]>