summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* i965: Add support for swizzling arbitrary immediates to (brw_)swizzle().Francisco Jerez2016-03-063-3/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Scalar immediates used to be handled correctly by swizzle() (as the identity) but since commit 58fa9d47b536403c4e3ca5d6a2495691338388fd it will corrupt the contents of the immediate. Vector immediates were never handled correctly, but we had ad-hoc code to swizzle VF immediates in the vec4 copy propagation pass. This takes care of swizzling V and UV in addition. v2: Don't implement swizzling of V/UV immediates (Matt). If you need to swizzle an integer vector immediate in the future apply the following diff to go back to v1: --- a/src/mesa/drivers/dri/i965/brw_eu.c +++ b/src/mesa/drivers/dri/i965/brw_eu.c @@ -119,11 +119,10 @@ brw_swap_cmod(uint32_t cmod) static unsigned imm_shift(enum brw_reg_type type, unsigned i) { - assert(type != BRW_REGISTER_TYPE_UV && type != BRW_REGISTER_TYPE_V && - "Not implemented."); - if (type == BRW_REGISTER_TYPE_VF) return 8 * (i & 3); + else if (type == BRW_REGISTER_TYPE_UV || type == BRW_REGISTER_TYPE_V) + return 4 * (i & 7); else return 0; } Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965: Pass symbolic swizzle to brw_swizzle() as a single argument.Francisco Jerez2016-03-064-20/+16
| | | | | | | | And replace brw_swizzle1() with brw_swizzle(). Seems slightly cleaner and will allow reusing brw_swizzle() in the vec4 back-end more easily. Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* nvc0: reset TFB bufctx when we no longer hold a reference to the buffersIlia Mirkin2016-03-062-2/+3
| | | | | | | | | | This fixes some use-after-free situations in dEQP when an xfb state is removed, and then a clear is triggered, which only does a partial validation. It would attempt to read the no-longer-valid buffers, resulting in crashes. Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.1 11.2" <[email protected]>
* nv50/ir: using sampleid/pos shouldn't force per-sample interpolationIlia Mirkin2016-03-052-6/+1
| | | | | | See https://www.khronos.org/bugzilla/show_bug.cgi?id=1462 Signed-off-by: Ilia Mirkin <[email protected]>
* st/mesa: don't force per-sample interp if only sampleid/pos are usedIlia Mirkin2016-03-052-12/+6
| | | | | | | | | | | The OES extensions clarify this behaviour to differentiate between per-sample invocation and per-sample interpolation. Using sampleid/pos will force per-sample invocation but not per-sample interpolation. See https://www.khronos.org/bugzilla/show_bug.cgi?id=1462 Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* swrast: fix GL_ANY_SAMPLES_PASSED values in ResultIlia Mirkin2016-03-051-0/+5
| | | | | | | | | | | | | Since commit 922be4eab, the expectation is that the query result contains the correct value. Unfortunately swrast does not distinguish between GL_SAMPLES_PASSED and GL_ANY_SAMPLES_PASSED. As a result, we must fix up the query result in a post-draw fixup. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94274 Signed-off-by: Ilia Mirkin <[email protected]> Tested-by: Vinson Lee <[email protected]> Reviewed-by: Brian Paul <[email protected]> Cc: "11.2" <[email protected]>
* st/mesa: 78-column wrapping in st_extensions.cBrian Paul2016-03-051-68/+107
| | | | Reviewed-by: Eduardo Lima Mitev <[email protected]>
* gallium/util: add new comments, assertions in u_debug_refcnt.cBrian Paul2016-03-051-0/+20
| | | | Reviewed-by: Eduardo Lima Mitev <[email protected]>
* gallium/util: update comments and URL in u_debug_refcnt.cBrian Paul2016-03-051-4/+10
| | | | Reviewed-by: Eduardo Lima Mitev <[email protected]>
* gallium/util: make stream variable static in u_debug_refcnt.cBrian Paul2016-03-051-1/+1
| | | | Reviewed-by: Eduardo Lima Mitev <[email protected]>
* gallium/util: re-indent u_debug_refcnt.[ch]Brian Paul2016-03-052-50/+65
| | | | | | Wrap comments to 78 columns, etc. Reviewed-by: Eduardo Lima Mitev <[email protected]>
* gallium/tests: silence warning in compute.cBrian Paul2016-03-051-1/+1
| | | | | | | | | | | compute.c: In function ‘launch_grid’: compute.c:435:20: warning: assignment discards ‘const’ qualifier from pointer target type [enabled by default] info.input = input; ^ Maybe the pipe_grid_info::input field should be const void *? Reviewed-by: Samuel Pitoiset <[email protected]>
* glsl: replace remaining tabs in link_varyings.cppTimothy Arceri2016-03-051-9/+9
| | | | Reviewed-by: Thomas Helland <[email protected]>
* glsl: replace remaining tabs in link_uniforms.cppTimothy Arceri2016-03-051-69/+69
| | | | Reviewed-by: Thomas Helland <[email protected]>
* docs: mark align layout qualifier as DONETimothy Arceri2016-03-051-1/+1
| | | | | Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* glsl: apply align layout qualifier rules to block offsetsTimothy Arceri2016-03-051-3/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From Section 4.4.5 (Uniform and Shader Storage Block Layout Qualifiers) of the OpenGL 4.50 spec: "The align qualifier makes the start of each block member have a minimum byte alignment. It does not affect the internal layout within each member, which will still follow the std140 or std430 rules. The specified alignment must be a power of 2, or a compile-time error results. The actual alignment of a member will be the greater of the specified align alignment and the standard (e.g., std140) base alignment for the member's type. The actual offset of a member is computed as follows: If offset was declared, start with that offset, otherwise start with the next available offset. If the resulting offset is not a multiple of the actual alignment, increase it to the first offset that is a multiple of the actual alignment. This results in the actual offset the member will have. When align is applied to an array, it affects only the start of the array, not the array's internal stride. Both an offset and an align qualifier can be specified on a declaration. The align qualifier, when used on a block, has the same effect as qualifying each member with the same align value as declared on the block, and gets the same compile-time results and errors as if this had been done. As described in general earlier, an individual member can specify its own align, which overrides the block-level align, but just for that member. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* glsl: parse align layout qualifierTimothy Arceri2016-03-053-0/+26
| | | | | Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* docs: mark explicit byte offsets as DONETimothy Arceri2016-03-051-1/+1
| | | | Reviewed-by: Edward O'Callaghan <[email protected]>
* glsl: use explicit offset when lowering buffer accessTimothy Arceri2016-03-051-0/+4
| | | | Reviewed-by: Edward O'Callaghan <[email protected]>
* glsl: copy explicit offset to uniform storageTimothy Arceri2016-03-053-0/+20
| | | | Reviewed-by: Edward O'Callaghan <[email protected]>
* glsl: update comment on offset fieldTimothy Arceri2016-03-051-1/+1
| | | | | | | The old comment was for the location not the offset, we now use the field for block members so mention that also. Reviewed-by: Edward O'Callaghan <[email protected]>
* glsl: add offset to glsl interface typeTimothy Arceri2016-03-054-0/+18
| | | | | | | | | | | | | | | | In this patch we also copy the offset value from the ast and implement offset linking rules by adding it to the record_compare() function. From Section 4.4.5 (Uniform and Shader Storage Block Layout Qualifiers) of the GLSL 4.50 spec: "Two blocks linked together in the same program with the same block name must have the exact same set of members qualified with offset and their integral-constant-expression values must be the same, or a link-time error results." Reviewed-by: Edward O'Callaghan <[email protected]>
* glsl: apply compile-time rules for the offset layout qualifierTimothy Arceri2016-03-051-0/+49
| | | | | | | | | | | | | | | | | | | | | | | This implements the rules for the offset qualifier on block members. From Section 4.4.5 (Uniform and Shader Storage Block Layout Qualifiers) of the GLSL 4.50 spec: "The offset qualifier can only be used on block members of blocks declared with std140 or std430 layouts." ... "It is a compile-time error to specify an offset that is smaller than the offset of the previous member in the block or that lies within the previous member of the block." ... "The specified offset must be a multiple of the base alignment of the type of the block member it qualifies, or a compile-time error results." Reviewed-by: Edward O'Callaghan <[email protected]>
* glsl: enable offset layout qualifier for ARB_enhanced_layoutsTimothy Arceri2016-03-051-1/+2
| | | | Reviewed-by: Edward O'Callaghan <[email protected]>
* glsl: reject invalid input layout qualifiersTimothy Arceri2016-03-051-0/+29
| | | | | | | | | | | Global in validation is already handled, this will do the validation for variables, blocks and block members. This fixes some CTS tests for the new enhanced layouts transform feedback qualifiers. V2: add some more valid input flags Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* glsl: only apply default stream to output blocksTimothy Arceri2016-03-051-1/+2
| | | | | | This is needed to allow invalid qualifier checks on inputs. Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* glsl: rework parsing of blocksTimothy Arceri2016-03-052-32/+24
| | | | | | | | | | | Previously interface blocks were giving the global default flags of uniform blocks. This meant we could not check for invalid qualifiers on interface blocks because they always contained invalid flags. This changes parsing so that interface blocks now get an empty set of layouts. Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* glsl: don't apply uniform/buffer layouts to interface blocksTimothy Arceri2016-03-051-6/+7
| | | | | | | | If the following patch we will stop setting these layouts by default on interface blocks, so we need to do this to avoid hitting the assert. Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* i965: Only magnify depth for 3D textures, not array textures.Kenneth Graunke2016-03-041-1/+1
| | | | | | | | | | | | | | | | | When BaseLevel > 0, we magnify the dimensions to fill out the size of miplevels [0..BaseLevel). In particular, this was magnifying depth, thinking that the depth doubles at each level. This is perfectly reasonable for 3D textures, but dead wrong for array textures. Changing the depth != 1 condition to a target == GL_TEXTURE_3D check should make this only happen in the appropriate cases. Fixes about 32 dEQP tests: - dEQP-GLES31.functional.texture.gather.*.level_{1,2} Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: [email protected]
* i965/vec4: add opportunistic behaviour to opt_vector_float()Juan A. Suarez Romero2016-03-042-21/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | opt_vector_float() transforms several scalar MOV operations to a single vectorial MOV. This is done when those MOV covers all the components of the destination register. So something like: mov vgrf3.0.xy:D, 0D mov vgrf3.0.w:D, 1065353216D mov vgrf3.0.z:D, 0D is transformed in: mov vgrf3.0:F, [0F, 0F, 0F, 1F] But there are cases where not all the components are written. For example, in: mov vgrf2.0.x:D, 1073741824D mov vgrf3.0.xy:D, 0D mov vgrf3.0.w:D, 1065353216D mov vgrf4.0.xy:D, 1065353216D mov vgrf4.0.w:D, 0D mov vgrf6.0:UD, u4.xyzw:UD Nor vgrf3 nor vgrf4 .z components are written, so the optimization is not applied. But it could be applied anyway with the components covered, using a writemask to select the ones written. So we could transform it in: mov vgrf2.0.x:D, 1073741824D mov vgrf3.0.xyw:F, [0F, 0F, 0F, 1F] mov vgrf4.0.xyw:F, [1F, 1F, 0F, 0F] mov vgrf6.0:UD, u4.xyzw:UD This commit does precisely that: opportunistically apply opt_vector_float() when possible. total instructions in shared programs: 7124660 -> 7114784 (-0.14%) instructions in affected programs: 443078 -> 433202 (-2.23%) helped: 4998 HURT: 0 total cycles in shared programs: 64757760 -> 64728016 (-0.05%) cycles in affected programs: 1401686 -> 1371942 (-2.12%) helped: 3243 HURT: 38 v2: change vectorize_mov() signature (Matt). v3: take in account predicates (Juan). v4 [mattst88]: Update shader-db numbers. Fix some whitespace issues. Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Juan A. Suarez Romero <[email protected]>
* st/xlib: Don't destroy screen on XCloseDisplay()George Kyriazis2016-03-041-3/+7
| | | | | | | | | screen may still be used by other resources that are not yet freed. To correctly fix this there will be a need to account for resources differently, but this quick fix is not any worse than the original code that leaked screens anyway. Reviewed-by: Brian Paul <[email protected]>
* i965/fs: Optimize float conversions of byte/word extract.Matt Turner2016-03-042-0/+48
| | | | | | | | | | | | | | | | | | instructions in affected programs: 31535 -> 29966 (-4.98%) helped: 23 cycles in affected programs: 272648 -> 266022 (-2.43%) helped: 14 HURT: 1 The patch decreases the number of instructions in the two Unigine programs by: #1721: 4374 -> 4155 instructions (-5.01%) #1706: 3582 -> 3363 instructions (-6.11%) Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
* nir: Recognize open-coded extract_u16.Matt Turner2016-03-041-0/+5
| | | | | | | | No shader-db changes, but does recognize some extract_u16 which enables the next patch to optimize some code. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
* nir: Recognize open-coded extract_u8.Matt Turner2016-03-041-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Two shaders that appear in Unigine benchmarks (Heaven and Valley) unpack three bytes from an integer and convert each into a float: float((val >> 16u) & 0xffu) float((val >> 8u) & 0xffu) float((val >> 0u) & 0xffu) Instead of shifting, masking, and type converting like this: shr(8) g15<1>UD g25<8,8,1>UD 0x00000010UD and(8) g16<1>UD g15<8,8,1>UD 0x000000ffUD mov(8) g17<1>F g16<8,8,1>UD shr(8) g18<1>UD g25<8,8,1>UD 0x00000008UD and(8) g19<1>UD g18<8,8,1>UD 0x000000ffUD mov(8) g20<1>F g19<8,8,1>UD and(8) g21<1>UD g25<8,8,1>UD 0x000000ffUD mov(8) g22<1>F g21<8,8,1>UD i965 can simply extract a byte and convert to float in a single instruction: mov(8) g17<1>F g25.2<32,8,4>UB mov(8) g20<1>F g25.1<32,8,4>UB mov(8) g22<1>F g25.0<32,8,4>UB This patch implements the first step: recognizing byte extraction. A later patch will optimize out the conversion to float. instructions in affected programs: 28568 -> 27450 (-3.91%) helped: 7 cycles in affected programs: 210076 -> 203144 (-3.30%) helped: 7 This patch decreases the number of instructions in the two Unigine programs by: #1721: 4520 -> 4374 instructions (-3.23%) #1706: 3752 -> 3582 instructions (-4.53%) Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
* st/xlib: Hang off screen destructor off main XCloseDisplay() callback.George Kyriazis2016-03-043-35/+27
| | | | | | | | This resolves some order dependencies between the already existing callback the newly created one. Tested-by: Brian Paul <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* st/xlib: Support unlimited number of display connectionsGeorge Kyriazis2016-03-041-19/+103
| | | | | | | | | | | | | | | There is a limit of 10 display connections, which was a problem for apps/tests that were continuously opening/closing display connections. This fix uses XAddExtension() and XESetCloseDisplay() to keep track of the status of the display connections from the X server, freeing mesa-related data as X displays get destroyed by the X server. Poster child is the VTK "TimingTests" Tested-by: Brian Paul <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* svga: add new command-buffer-size HUD queryBrian Paul2016-03-044-7/+23
| | | | | | To plot a graph of the command buffer size. Reviewed-by: Charmaine Lee <[email protected]>
* svga: add new svga_winsys_context::get_command_buffer_size()Brian Paul2016-03-042-0/+14
| | | | | | | To ask how large the current command buffer is. Will be used for a new GALLIUM_HUD graph. Reviewed-by: Charmaine Lee <[email protected]>
* svga: reorder SVGA_QUERY_ switch cases to match declaration orderBrian Paul2016-03-041-9/+9
| | | | Reviewed-by: Charmaine Lee <[email protected]>
* svga: Force an RGBA view creation for an RGBA resourceSinclair Yeh2016-03-041-1/+10
| | | | | | | | | | | glXCreatePixmap() may specify a GLX_TEXTURE_FORMAT_RGB_EXT format for an RGBA resource, causing us to create an RGBX view for an RGBA resource, a combination vgpu10 does not support. When this is detected, change the request to create an RGBA view instead. Reviewed-by: Brian Paul <[email protected]>
* svga: fix an error in svga_texture_generate_mipmapCharmaine Lee2016-03-041-1/+6
| | | | | | | With this patch, make sure the shader resource view is properly created before referencing it in the generate mipmap command. Reviewed-by: Brian Paul <[email protected]>
* winsys/svga: Increase the fence timeoutThomas Hellstrom2016-03-041-1/+2
| | | | | | | | | | | If running with a software renderer backend, the timeout may be insufficient, and we don't want to release busy buffers too early. In practice, SVGA gpu lockups are extremely rare. Signed-off-by: Thomas Hellstrom <[email protected]> Reviewed-by: Brian Paul <[email protected]> Cc: "11.0 11.1" <[email protected]>
* winsys/svga: Fix an uninitialized return valueThomas Hellstrom2016-03-041-0/+2
| | | | | | | Reported-by: Brian Paul <[email protected]> Signed-off-by: Thomas Hellstrom <[email protected]> Reviwed-by: Brian Paul <[email protected]> Cc: "11.0 11.1" <[email protected]>
* i965: Set MaxFramebufferWidth/Height to 16384, not viewport.Kenneth Graunke2016-03-031-2/+2
| | | | | | | | | | | | | | | | | | dEQP-GLES31.functional.fbo.no_attachments.maximums.{all,height,size,width} started hitting assertion failures when emitting SURFACE_STATE, after commit e8fd60e7891c7 where Samuel increased the maximum viewport size to 32768, from 16384. MaxFramebufferWidth/Height were being set to the maximum viewport size, but are actually limited by the SURFACE_STATE width/height field range, which is 16384 on Gen7+ (where ARB_framebuffer_no_attachments is exposed). So, reduce these to 16384 explicitly. Fixes assert fails in the above mentioned dEQP tests. (Those tests still fail, however.) Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* glsl: Improve the accuracy of the acos() approximation.Francisco Jerez2016-03-031-1/+1
| | | | | | | | | | | | | | | | The adjusted polynomial coefficients come from the numerical minimization of the L2 norm of the relative error. The old coefficients would give a maximum relative error of about 15000 ULP in the neighborhood around acos(x) = 0, the new ones give a relative error bounded by less than 2000 ULP in the same neighborhood. Fixes four dEQP subtests: dEQP-GLES31.functional.shaders.builtin_functions.precision.acos. highp_compute.{scalar,vec2,vec3,vec4} Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* glsl: Parameterize asin_expr() on the fit coefficients.Kenneth Graunke2016-03-031-6/+6
| | | | | | | | | | | | This will allow us to share the implementation while using different polynomials for asin() and acos(). Francisco Jerez did this in the SPIR-V front-end; I'm merely porting his idea to the GLSL world. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* mesa: Allow Get*() of several forgotten IsEnabled() pnames.Kenneth Graunke2016-03-032-0/+5
| | | | | | | | | | | | | | | | | | | | From section 6.2 ("State Tables") of the GL 2.1 specification (the text also appears in the GL 3.0 and ES 3.1 specifications): "However, state variables for which IsEnabled is listed as the query command can also be obtained using GetBooleanv, GetIntegerv, GetFloatv, and GetDoublev." GL_DEBUG_OUTPUT, GL_DEBUG_OUTPUT_SYNCHRONOUS, and GL_FRAGMENT_SHADER_ATI were missing from the glGet*() functions. All other IsEnabled() pnames look to be present, as far as I can tell. Fixes 8 dEQP-GLES31.functional.debug.state_query subtests: debug_output[_synchronous]_get{boolean,float,integer,integer64}. Cc: [email protected] Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* mesa: Make glGet queries initialize ctx->Debug when necessary.Kenneth Graunke2016-03-031-14/+6
| | | | | | | | | | | | | | | | | | | | | | | dEQP-GLES31.functional.debug.state_query.debug_group_stack_depth_* tries to call glGet on GL_DEBUG_GROUP_STACK_DEPTH right away, before doing any other debug setup. This should return 1. However, because ctx->Debug wasn't allocated, we bailed and returned 0. This patch removes the open-coded locking and switches the two glGet functions to use _mesa_lock_debug_state(), which takes care of allocating and initializing that state on the first time. It also conveniently takes care of unlocking on failure for us, so we don't need to handle that in every caller. Fixes dEQP-GLES31.functional.debug.state_query.debug_group_stack_depth_ {getboolean,getfloat,getinteger,getinteger64}. Cc: [email protected] Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* Update docs to advertise new support for ARB_internalformat_query2Eduardo Lima Mitev2016-03-032-1/+2
| | | | | | | | Support in Mesa main and i965 has just been added. v2: Include note in 'New Features' of docs/relnotes/11.3.0.html. Reviewed-by: Ilia Mirkin <[email protected]>
* i965: Enable the ARB_internalformat_query2 extensionAntia Puentes2016-03-031-0/+1
| | | | Reviewed-by: Dave Airlie <[email protected]>