summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* i965: Let the caller of brw_set_dp_write/read_message control the target cache.Francisco Jerez2016-12-143-42/+43
| | | | | | | | | | | | | | | | | brw_set_dp_read_message already had a target_cache argument, but its interpretation was rather convoluted (on Gen6 the render cache was used if the caller asked for it, otherwise it was ignored using the sampler cache instead), and the constant cache wasn't representable at all. brw_set_dp_write_message used the data cache on Gen7+ except for RENDER_TARGET_WRITE messages, in which case it would use the render cache. On Gen6 the render cache was always used. Instead of the above, provide the shared unit SFID that the caller expects will be used. Makes no functional changes. v3: Non-trivial rebase. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/gen6+: Invalidate constant cache on brw_emit_mi_flush().Francisco Jerez2016-12-141-0/+1
| | | | | | | In order to make sure that the constant cache is coherent with previous rendering when we start using it for pull constant loads. Reviewed-by: Kenneth Graunke <[email protected]>
* genxml: Make Gen8 3DSTATE_DS SIMD8 enable work like Gen9+.Kenneth Graunke2016-12-141-1/+4
| | | | | | | This will let us avoid ifdefs. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* genxml: Rename "DS Function Enable" to "Function Enable".Kenneth Graunke2016-12-142-2/+2
| | | | | | | This makes Gen7/7.5 match Gen8-9. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Reject VkMemoryAllocateInfo::allocationSize == 0Chad Versace2016-12-141-5/+2
| | | | | | The Vulkan 1.0.33 spec says "allocationSize must be greater than 0". Reviewed-by: Nanley Chery <[email protected]>
* egl: Fix crashes in eglCreate*Surface()Chad Versace2016-12-141-2/+2
| | | | | | | | | | | | | Don't dereference a null EGLDisplay. Fixes tests dEQP-EGL.functional.negative_api.create_pbuffer_surface dEQP-EGL.functional.negative_api.create_pixmap_surface Reviewed-by: Mark Janes <[email protected]> Reviewed-by: Tapani Pälli <[email protected]> Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=99038 Cc: "13.0" <[email protected]>
* i965/miptree: Use intel_miptree_copy for mapsJason Ekstrand2016-12-131-12/+8
| | | | | | | | | | What we're really doing is copying a texture not blitting it in the sense of glBlitFramebuffers. Also, the intel_miptree_copy function is capable of properly handling compressed textures which intel_miptree_blit is not. Reviewed-by: Topi Pohjolainen <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97473 Cc: "13.0" <[email protected]>
* i965/blit: Fix the src dimension sanity check in miptree_copyJason Ekstrand2016-12-131-2/+10
| | | | | Reviewed-by: Topi Pohjolainen <[email protected]> Cc: "13.0" <[email protected]>
* docs: add INTEL_conservative_rasterization to relaese notes for 13.1.0Lionel Landwerlin2016-12-131-0/+1
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* main: add INTEL_conservative_rasterization enum query supportLionel Landwerlin2016-12-132-0/+8
| | | | | | | v2: add extra parameter (Ilia) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* glapi: add missing INTEL_conservative_rasterizationLionel Landwerlin2016-12-131-0/+4
| | | | | | | v2: put enum directly in gl_API.xml (Ilia) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* extensions: update INTEL_conservative_rasterization dependenciesLionel Landwerlin2016-12-131-1/+1
| | | | | | | Suggested by Ilia. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* main: don't error when enabling conservative rasterization on glesLionel Landwerlin2016-12-131-1/+1
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* main: use new driver flag for conservative rasterization stateLionel Landwerlin2016-12-136-7/+19
| | | | | | | | | | | Suggested by Marek. v2: Use new driver flag (Marek) v3: Fix i965 comments (Lionel) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* nir/lower_tex: lower gradients on shadow cube maps if lower_txd_shadow is setIago Toral Quiroga2016-12-131-2/+4
| | | | | | | Even if lower_txd_cube_map isn't. Suggested by Ken to make the flag more consistent with its name. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: remove brw_lower_texture_gradientsIago Toral Quiroga2016-12-135-358/+1
| | | | | | | This has been ported to NIR now so we don'tneed to keep the GLSL IR lowering any more. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/nir: enable lowering of texture gradient for shadow samplersIago Toral Quiroga2016-12-131-0/+3
| | | | | | | This gets the lowering on the Vulkan driver too, which is required for hardware that does not have the sample_l_d message (up to IvyBridge). Reviewed-by: Kenneth Graunke <[email protected]>
* nir/lower_tex: add lowering for texture gradient on shadow samplersIago Toral Quiroga2016-12-132-0/+67
| | | | | | | | | | | | | | | | | | | | This is ported from the Intel lowering pass that we use with GLSL IR. This takes care of lowering texture gradients on shadow samplers other than cube maps. Intel hardware requires this for gen < 8. v2 (Ken): - Use the helper function to retrieve ddx/ddy - Swizzle away size components we are not interested in v3: - Get rid of the ddx/ddy helper and use nir_tex_instr_src_index instead (Ken, Eric) v4: - Add a 'continue' statement if the lowering makes progress because it replaces the original texture instruction Reviewed-by: Kenneth Graunke <[email protected]> (v3)
* i965/nir: enable lowering of texture gradient for cube mapsIago Toral Quiroga2016-12-131-0/+1
| | | | | | | | | This gets the lowering on the Vulkan driver too. Fixes Vulkan CTS cube map texture gradient tests in: dEQP-VK.glsl.texture_functions.texturegrad.* Reviewed-by: Kenneth Graunke <[email protected]>
* nir/lower_tex: add lowering for texture gradient on cube mapsIago Toral Quiroga2016-12-132-0/+213
| | | | | | | | | | | | | | | | | | | | | | | | This is ported from the Intel lowering pass that we use with GLSL IR. The NIR pass only handles cube maps, not shadow samplers, which are also lowered for gen < 8 on Intel hardware. We will add support for that in a later patch, at which point we should be able to remove the GLSL IR lowering pass. v2: - added a helper to retrieve ddx/ddy parameters (Ken) - No need to make size.z=1.0, we are only using component x anyway (Iago) v3: - Get rid of the ddx/ddy helper and use nir_tex_instr_src_index instead (Ken, Eric) v4: - When emitting the textureLod operation, copy all texture parameters from the original textureGrad() (except for ddx/ddy) using a loop - Add a 'continue' statement if the lowering makes progress because it replaces the original texture instruction Reviewed-by: Kenneth Graunke <[email protected]> (v3)
* nir/lower_tex: generalize get_texture_size()Iago Toral Quiroga2016-12-131-5/+10
| | | | | | | This was written specifically for RECT samplers. Make it more generic so we can call this from the gradient lowerings too. Reviewed-by: Kenneth Graunke <[email protected]>
* treewide: s/comparitor/comparator/Ilia Mirkin2016-12-1231-85/+85
| | | | | | | | | | git grep -l comparitor | xargs sed -i 's/comparitor/comparator/g' Just happened to notice this in a patch that was sent and included one of the tokens in question. Signed-off-by: Ilia Mirkin <[email protected]> Acked-by: Nicolai Hähnle <[email protected]>
* nir: Only float and double types can be matricesIan Romanick2016-12-122-19/+24
| | | | | | | | | | | | | | In 19a541f (nir: Get rid of nir_constant_data) a number of places that operated on nir_constant::values were mechanically converted to operate on the whole array without regard for the base type. Only GLSL_TYPE_FLOAT and GLSL_TYPE_DOUBLE can be matrices, so only those types can have data in the non-0 array element. See also b870394. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Cc: Iago Toral Quiroga <[email protected]>
* swr: [rasterizer core/memory] StoreTile: AVX512 progressTim Rowley2016-12-122-222/+138
| | | | | | Fixes to 128-bit formats. Reviwed-by: Bruce Cherniak <[email protected]>
* nir: Move fsat outside of fmin/fmax if second arg is 0 to 1.Matt Turner2016-12-122-0/+25
| | | | | | | | | | instructions in affected programs: 550 -> 544 (-1.09%) helped: 6 cycles in affected programs: 6952 -> 6850 (-1.47%) helped: 6 Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Reject copy propagation into SEL if not min/max.Matt Turner2016-12-122-1/+12
| | | | | | | | | | | | | | | | | | | | | We shouldn't ever see a SEL with conditional mod other than GE (for max) or L (for min), but we might see one with predication and no conditional mod. total instructions in shared programs: 8241806 -> 8241902 (0.00%) instructions in affected programs: 13284 -> 13380 (0.72%) HURT: 62 total cycles in shared programs: 84165104 -> 84166244 (0.00%) cycles in affected programs: 75364 -> 76504 (1.51%) helped: 10 HURT: 34 Fixes generated code in at least Sanctum 2, Borderlands 2, Goat Simulator, XCOM: Enemy Unknown, and Shogun 2. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92234 Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Add unit tests for copy propagation pass.Matt Turner2016-12-122-0/+211
| | | | | | Pretty basic, but it's a start. Acked-by: Jason Ekstrand <[email protected]>
* i965/fs: Rename opt_copy_propagate -> opt_copy_propagation.Matt Turner2016-12-123-15/+16
| | | | | | Matches the vec4 backend, cmod propagation, and saturate propagation. Reviewed-by: Jason Ekstrand <[email protected]>
* radeonsi: shrink the GSVS ring to account for the reduced item sizesNicolai Hähnle2016-12-121-1/+1
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: shrink each vertex stream to the actually required sizeNicolai Hähnle2016-12-122-25/+40
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: use a single descriptor for the GSVS ringNicolai Hähnle2016-12-124-50/+67
| | | | | | | | | We can hardcode all of the fields for swizzling in the geometry shader. The advantage is that we use fewer descriptor slots and we no longer have to update any of the (ring) descriptors when the geometry shader changes. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: pack GS output components for each vertex stream contiguouslyNicolai Hähnle2016-12-121-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Note that the memory layout of one vertex stream inside one "item" (= memory written by one GS wave) on the GSVS ring is: t0v0c0 ... t15v0c0 t0v1c0 ... t15v1c0 ... t0vLc0 ... t15vLc0 t0v0c1 ... t15v0c1 t0v1c1 ... t15v1c1 ... t0vLc1 ... t15vLc1 ... t0v0cL ... t15v0cL t0v1cL ... t15v1cL ... t0vLcL ... t15vLcL t16v0c0 ... t31v0c0 t16v1c0 ... t31v1c0 ... t16vLc0 ... t31vLc0 t16v0c1 ... t31v0c1 t16v1c1 ... t31v1c1 ... t16vLc1 ... t31vLc1 ... t16v0cL ... t31v0cL t16v1cL ... t31v1cL ... t16vLcL ... t31vLcL ... t48v0c0 ... t63v0c0 t48v1c0 ... t63v1c0 ... t48vLc0 ... t63vLc0 t48v0c1 ... t63v0c1 t48v1c1 ... t63v1c1 ... t48vLc1 ... t63vLc1 ... t48v0cL ... t63v0cL t48v1cL ... t63v1cL ... t48vLcL ... t63vLcL where tNN indicates the thread number, vNN the vertex number (in the order of EMIT_VERTEX), and cNN the output component (vL and cL are the last vertex and component, respectively). The vertex streams are laid out sequentially. The swizzling by 16 threads is hard-coded in the way the VGT generates the offset passed into the GS copy shader, and the jump every 16 threads is calculated from VGT_GSVS_RING_OFFSET_n and VGT_GSVS_RING_ITEMSIZE in a way that makes it difficult to deviate from this layout (at least that's what I've experimentally confirmed on VI after first trying to go the simpler route of just interleaving the vertex streams). Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: do not write non-existent components through the GSVS ringNicolai Hähnle2016-12-121-2/+4
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: only write values belonging to the stream when emitting GS vertexNicolai Hähnle2016-12-121-0/+3
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: generate an explicit switch instruction over vertex streamsNicolai Hähnle2016-12-121-8/+13
| | | | | | | | | | | | SimplifyCFG generates a switch instruction anyway when all four streams are present, but is simultaneously not smart enough to eliminate some redundant jumps that it generates. The generated assembly is still a bit silly, probably because the control flow annotation doesn't know how to handle a switch with uniform condition. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: fetch only outputs of current vertex stream from the GSVS ringNicolai Hähnle2016-12-121-16/+25
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: only export from GS copy shader for vertex stream 0Nicolai Hähnle2016-12-121-12/+19
| | | | | | | | When running the copy shader for vertex streams != 0, the SX does not need any data from us (there is no rasterization for the higher vertex streams, only streamout). Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: do not export VS outputs from vertex streams != 0Nicolai Hähnle2016-12-121-0/+6
| | | | | | | | | | | | This affects for GS copy shaders. When an output is meant for vertex stream != 0, then we don't have to make it available to the pixel shader. There is a minor inefficiency here because the GLSL varying packing pass does not group varyings of the same vertex stream together, but it shouldn't be important in practice. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: pull iteration over vertex streams into GS copy shader logicNicolai Hähnle2016-12-121-25/+37
| | | | | | The iteration is not needed for normal vertex shaders. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: group streamout writes by vertex streamNicolai Hähnle2016-12-121-10/+22
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: load the streamout buf descriptors closer to their useNicolai Hähnle2016-12-121-14/+11
| | | | | | LLVM can still decide to hoist the loads since they're marked invariant. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: extract writing of a single streamout outputNicolai Hähnle2016-12-121-39/+52
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: separate the call to si_llvm_emit_streamout from exportsNicolai Hähnle2016-12-121-4/+4
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: plumb the output vertex_stream through to si_shader_output_valuesNicolai Hähnle2016-12-121-1/+9
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: rename members of si_shader_output_valuesNicolai Hähnle2016-12-121-8/+8
| | | | | | Be a bit more verbose and avoid confusion in future patches. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: fix an off-by-one error in the bounds check for max_verticesNicolai Hähnle2016-12-121-1/+1
| | | | | | | | | | | The spec actually says that calling EmitStreamVertex is undefined when you exceed max_vertices. But we do need to avoid trampling over memory outside the GSVS ring. Cc: [email protected] Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: do not kill GS with memory writesNicolai Hähnle2016-12-121-8/+22
| | | | | | | | | | | Vertex emits beyond the specified maximum number of vertices are supposed to have no effect, which is why we used to always kill GS that reached the limit. However, if the GS also writes to memory (SSBO, atomics, shader images), then we must keep going and only skip the vertex emit itself. Cc: [email protected] Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: update all GSVS ring descriptors for new buffer allocationsNicolai Hähnle2016-12-121-1/+6
| | | | | | | | Fixes GL45-CTS.gtf40.GL3Tests.transform_feedback3.transform_feedback3_geometry_instanced. Cc: [email protected] Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* st/glsl_to_tgsi: plumb the GS output stream qualifier through to TGSINicolai Hähnle2016-12-123-1/+31
| | | | | | Allow drivers to emit GS outputs in a smarter way. Reviewed-by: Marek Olšák <[email protected]>
* tgsi/scan: collect information about output usagemasksNicolai Hähnle2016-12-122-0/+2
| | | | Reviewed-by: Marek Olšák <[email protected]>