summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* nir: fix example in opt_peel_loop_initial_if descriptionCaio Marcelo de Oliveira Filho2019-02-121-3/+3
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* nir/opt_if: don't mark progress if nothing changesKarol Herbst2019-02-131-0/+7
| | | | | | | | | | | | | | | | | | | | if we have something like this: loop { ... if x { break; } else { continue; } } opt_if_loop_last_continue returns true marking progress allthough nothing changes. Fixes: 5921a19d4b0c6 "nir: add if opt opt_if_loop_last_continue()" Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* radeonsi: Fix guardband computation for large render targetsOscar Blumberg2019-02-121-2/+28
| | | | | | | | | Stop using 12.12 quantization for viewports that are not contained in the lower 4k corner of the render target as the hardware needs to keep both absolute and relative coordinates representable. Signed-off-by: Marek Olšák <[email protected]> Cc: 18.3 19.0 <[email protected]>
* egl: fix KHR_partial_update without EXT_buffer_ageChia-I Wu2019-02-121-1/+6
| | | | | | | | EGL_BUFFER_AGE_EXT can be queried without EXT_buffer_age. Signed-off-by: Chia-I Wu <[email protected]> Acked-by: Lionel Landwerlin <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* mesa: Advertise EXT_float_blend in ES 3.0+ contexts.Kenneth Graunke2019-02-121-0/+1
| | | | | | | | | | | | | | | | | | | | This extension simply drops a draw time restriction: "Furthermore, an INVALID_OPERATION error is generated by DrawArrays and the other drawing commands defined in section 2.8.3 (10.5 in ES 3.1) if blending is enabled (see below) and any draw buffer has 32-bit floating-point format components." We never correctly enforced this restriction anyway, so we were basically already implementing it. We just need to advertise it for our behavior to be correct. The extension requires EXT_color_buffer_float, but we already enable that via dummy_true. So we can dummy_true this one as well. Found while debugging WebGL conformance tests. Does not fix any. Reviewed-by: Tapani Pälli <[email protected]>
* gallium/swr: Param defaults for unhandled PIPE_CAPsAlok Hota2019-02-121-4/+3
| | | | | | | Without using this function, we fail the -Wswitch flag when compiling the default debugoptimized mode in Meson Reviewed-by: Bruce Cherniak <[email protected]>
* anv/cmd_buffer: check for NULL framebufferJuan A. Suarez Romero2019-02-121-5/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This can happen when we record a VkCmdDraw in a secondary buffer that was created inheriting from the primary buffer, but with the framebuffer set to NULL in the VkCommandBufferInheritanceInfo. Vulkan 1.1.81 spec says that "the application must ensure (using scissor if neccesary) that all rendering is contained in the render area [...] [which] must be contained within the framebuffer dimesions". While this should be done by the application, commit 465e5a86 added the clamp to the framebuffer size, in case of application does not do it. But this requires to know the framebuffer dimensions. If we do not have a framebuffer at that moment, the best compromise we can do is to just apply the scissor as it is, and let the application to ensure the rendering is contained in the render area. v2: do not clamp to framebuffer if there isn't a framebuffer v3 (Jason): - clamp earlier in the conditional - clamp to render area if command buffer is primary v4: clamp also x and y to render area (Jason) v5: rename used variables (Jason) Fixes: 465e5a86 ("anv: Clamp scissors to the framebuffer boundary") CC: Jason Ekstrand <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* radeonsi: use MEM instead of MEM_GRBM in COPY_DATA.DST_SELMarek Olšák2019-02-121-3/+3
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add AMD_DEBUG env var as an alternative to R600_DEBUGMarek Olšák2019-02-121-1/+3
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radv: fix using LOAD_CONTEXT_REG with old GFX ME firmwares on GFX8Samuel Pitoiset2019-02-123-3/+10
| | | | | | | | | This fixes a critical issue. Cc: <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109575 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add support for push constants inlining when possibleSamuel Pitoiset2019-02-125-28/+152
| | | | | | | | | | | | | | | | | | | | | | | | | | | This removes some scalar loads from shaders, but it increases the number of SET_SH_REG packets. This is currently basic but it could be improved if needed. Inlining dynamic offsets might also help. Original idea from Dave Airlie. 29077 shaders in 15096 tests Totals: SGPRS: 1321325 -> 1357101 (2.71 %) VGPRS: 936000 -> 932576 (-0.37 %) Spilled SGPRs: 24804 -> 24791 (-0.05 %) Code Size: 49827960 -> 49642232 (-0.37 %) bytes Max Waves: 242007 -> 242700 (0.29 %) Totals from affected shaders: SGPRS: 290989 -> 326765 (12.29 %) VGPRS: 244680 -> 241256 (-1.40 %) Spilled SGPRs: 1442 -> 1429 (-0.90 %) Code Size: 8126688 -> 7940960 (-2.29 %) bytes Max Waves: 80952 -> 81645 (0.86 %) Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: keep track of the number of remaining user SGPRsSamuel Pitoiset2019-02-121-0/+4
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: gather if shaders load dynamic offsets separatelySamuel Pitoiset2019-02-122-0/+2
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: gather more info about push constantsSamuel Pitoiset2019-02-124-1/+44
| | | | | | | | This is needed in order to inline some push constants when possible. This also adds a new helper for initializing the pass. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: fix compiler issues with GCC 9Samuel Pitoiset2019-02-121-42/+48
| | | | | | | | | | | | | | | | | "The C standard says that compound literals which occur inside of the body of a function have automatic storage duration associated with the enclosing block. Older GCC releases were putting such compound literals into the scope of the whole function, so their lifetime actually ended at the end of containing function. This has been fixed in GCC 9. Code that relied on this extended lifetime needs to be fixed, move the compound literals to whatever scope they need to accessible in." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109543 Cc: <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Gustaw Smolarczyk <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* i965: add P0x formats and propagate required scaling factorsTapani Pälli2019-02-123-0/+17
| | | | | | Signed-off-by: Tapani Pälli <[email protected]> Signed-off-by: Lin Johnson <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/compiler: add scale_factors to sampler_prog_key_dataTapani Pälli2019-02-123-0/+8
| | | | | | | | Patch propagates given scale_factors to lowering options. Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* dri: add P010, P012, P016 for 10bit/12bit/16bit YUV420 formatsTapani Pälli2019-02-121-0/+17
| | | | | | Signed-off-by: Tapani Pälli <[email protected]> Signed-off-by: Lin Johnson <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* nir: add option to use scaling factor when sampling planes YUV loweringTapani Pälli2019-02-122-21/+35
| | | | | | | | | Patch adds nir_lower_tex_options as parameter to sample_plane so that we don't need to extend nir_tex_instr for this. Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Use info->textures_used instead of prog->SamplersUsed.Kenneth Graunke2019-02-112-7/+7
| | | | | | | | | | prog->SamplersUsed is set by the linker when validating resource limits, while info->textures_used is gathered after NIR optimizations, which may have eliminated some unused surfaces. This may let us skip some work. Reviewed-by: Eric Anholt <[email protected]>
* i965: Drop unnecessary 'and' with prog->SamplerUnitsKenneth Graunke2019-02-111-1/+1
| | | | | | | textures_used_by_txf is a subset of textures_used which is a subset of prog->SamplerUnits. This should do nothing. Reviewed-by: Eric Anholt <[email protected]>
* nir: Gather texture bitmasks in gl_nir_lower_samplers_as_deref.Kenneth Graunke2019-02-117-11/+40
| | | | | | | | | | | | | | | | | | | | | | | | Eric and I would like a bitmask of which samplers are used, similar to prog->SamplersUsed, but available in NIR. The linker uses SamplersUsed for resource limit checking, but later optimizations may eliminate more samplers. So instead of propagating it through, we gather a new one. While there, we also gather the existing textures_used_by_txf bitmask. Gathering these bitfields in nir_shader_gather_info is awkward at best. The main reason is that it introduces an ordering dependency between the two passes. If gathering runs before lower_samplers_as_deref, it can't look at var->data.binding. If the driver doesn't use the full lowering to texture_index/texture_array_size (like radeonsi), then the gathering can't use those fields. Gathering might be run early /and/ late, first to get varying info, and later to update it after variant lowering. At this point, should gathering work on pre-lowered or post-lowered code? Pre-lowered is also harder due to the presence of structure types. Just doing the gathering when we do the lowering alleviates these ordering problems. This fixes ordering issues in i965 and makes the txf info gathering work for radeonsi (though they don't use it). Reviewed-by: Eric Anholt <[email protected]>
* nir: Use sampler derefs in drawpixels and bitmap lowering.Kenneth Graunke2019-02-112-13/+34
| | | | Reviewed-by: Eric Anholt <[email protected]>
* program: Make prog_to_nir create texture/sampler derefs.Kenneth Graunke2019-02-111-5/+16
| | | | | | | | | | | | | Until now, prog_to_nir has been setting texture_index and sampler_index directly. This is different than GLSL shaders, which create variable dereferences and rely on lowering passes to reach this final form. radeonsi uses variable dereferences for samplers rather than texture_index and sampler_index, so it doesn't even make sense to set them there. By moving to derefs, we ensure that both GLSL and ARB programs produce the same final form that the driver desires. Reviewed-by: Eric Anholt <[email protected]>
* st/nir: Use sampler derefs in built-in shaders.Kenneth Graunke2019-02-112-8/+24
| | | | Reviewed-by: Eric Anholt <[email protected]>
* st/nir: Lower sampler derefs for builtin shaders.Kenneth Graunke2019-02-111-0/+2
| | | | Reviewed-by: Eric Anholt <[email protected]>
* st/nir: Pull sampler lowering into a helper function.Kenneth Graunke2019-02-112-4/+14
| | | | | | This will make it easier to reuse across GLSL / ARB / built-ins. Reviewed-by: Eric Anholt <[email protected]>
* i965: Call nir_lower_samplers for ARB programs.Kenneth Graunke2019-02-111-0/+2
| | | | | | | | | An upcoming patch will start building derefs in prog_to_nir, at which point we'll need to lower them to indexes. This gets both GLSL and non-GLSL shaders using the same paths. Reviewed-by: Eric Anholt <[email protected]>
* glsl: Don't look at sampler uniform storage for internal varsKenneth Graunke2019-02-111-3/+5
| | | | | | | | | | Passes like nir_lower_drawpixels add additional sampler variables, and set an explicit binding which never changes. These extra samplers don't have proper uniform storage associated with them, and there is no way to update bindings via the API. So, for any 'hidden' variables, just trust that there's an explicit binding set. Reviewed-by: Eric Anholt <[email protected]>
* glsl: Allow gl_nir_lower_samplers*() without a gl_shader_programKenneth Graunke2019-02-111-3/+11
| | | | | | | | | | | | | | | | I would like to be able to run gl_nir_lower_samplers() to turn texture and sampler variable dereferences into indexes and offsets, even for ARB programs, and built-in shaders. This would make sampler handling more consistent across the various types of shaders. For GLSL programs, the gl_nir_lower_samplers_as_deref() pass looks up the variable bindings in the shader program's uniform storage. But ARB programs and built-in shaders don't have a gl_shader_program, and uniform storage doesn't exist. In this case, we simply skip that lookup, and trust var->data.binding to be set correctly by whoever created the shader. Reviewed-by: Eric Anholt <[email protected]>
* st/mesa: Limit GL_MAX_[NATIVE_]PROGRAM_PARAMETERS_ARB to 2048Kenneth Graunke2019-02-111-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Piglit's vp-max-array test creates a vertex program containing a uniform array sized to the value of GL_MAX_NATIVE_PROGRAM_PARAMETERS_ARB. Mesa will then add additional state-var parameters for things like the MVP matrix. radeonsi currently exposes a value of 4096, derived from constant buffer upload size. This means the array will have 4096 elements, and the extra MVP state-vars would get a prog_src_register::Index of over 4096. Unfortunately, prog_src_register::Index is a signed 13-bit integer, so values beyond 4096 end up turning into negative numbers. Negative source indexes are only valid for relative addressing, so this ends up generating illegal IR. In prog_to_nir, this would cause an out of bounds array access. st_mesa_to_tgsi checks for a negative value, assumes it's bogus, and remaps it to parameter 0 in order to get something in-range. This isn't right - instead of reading the MVP matrix, it would read the first element of the vertex program's large array. But the test only checks that the program compiles, so we never noticed that it was broken. This patch limits the size of the program limits, with the understanding that we may need to generate additional state-vars internally. i965 has exposed 1024 for this limit for years, so I don't expect lowering it to 2048 will cause any practical problems for radeonsi or other drivers. Fixes vp-max-array with prog_to_nir.c. Cc: "19.0" <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* intel/dump_gpu: Disambiguate between BOs from different GEM handle spaces.Francisco Jerez2019-02-111-18/+23
| | | | | | | | | | | | | | | | This fixes a rather astonishing problem that came up while debugging an issue in the Vulkan CTS. Apparently the Vulkan CTS framework has the tendency to create multiple VkDevices, each one with a separate DRM device FD and therefore a disjoint GEM buffer object handle space. Because the intel_dump_gpu tool wasn't making any distinction between buffers from the different handle spaces, it was confusing the instruction state pools from both devices, which happened to have the exact same GEM handle and PPGTT virtual address, but completely different shader contents. This was causing the simulator to believe that the vertex pipeline was executing a fragment shader, which didn't end up well. Reviewed-by: Lionel Landwerlin <[email protected]>
* freedreno/a6xx: Fall back to masked RGBA blits for depth/stencilKristian H. Kristensen2019-02-111-5/+44
| | | | | | | | | | | | | | | | | | | The blitter doesn't seem to have a write mask, so for depth only and stencil only blits to Z24S8 we cast the Z24S8 buffer to an RGBA UNORM8 buffer and fall back to pipeline blits with corresponding write mask. Fixes dEQP-GLES3.functional.fbo.blit.depth_stencil.depth24_stencil8_stencil_only dEQP-GLES3.functional.fbo.invalidate.sub.unbind_blit_depth dEQP-GLES3.functional.fbo.invalidate.sub.unbind_blit_msaa_depth dEQP-GLES3.functional.fbo.invalidate.whole.unbind_blit_depth dEQP-GLES3.functional.fbo.invalidate.whole.unbind_blit_msaa_depth dEQP-GLES3.functional.fbo.msaa.2_samples.stencil_index8 dEQP-GLES3.functional.fbo.msaa.4_samples.stencil_index8 Reviewed-by: Rob Clark <[email protected]> Signed-off-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: Add format argument to fd6_tex_swiz()Kristian H. Kristensen2019-02-114-8/+10
| | | | | | | | We need to allow overriding the format with that of the image or sampler view, so we can't take it from the resource in fd6_tex_swiz(). Reviewed-by: Rob Clark <[email protected]> Signed-off-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: Support y-inverted blitsKristian H. Kristensen2019-02-111-5/+2
| | | | | | | | | | | | | | | | | The src coordinates are s24.8. For an inverted blit that ends at y=0 we need to program -1 for sy2, so we need to handle negative values correctly. Fixes dEQP-GLES3.functional.fbo.blit.rect.nearest_consistency_mag_reverse_dst_y dEQP-GLES3.functional.fbo.blit.rect.nearest_consistency_min_reverse_dst_y dEQP-GLES3.functional.fbo.blit.rect.nearest_consistency_min_reverse_src_y dEQP-GLES3.functional.fbo.invalidate.sub.unbind_blit_color dEQP-GLES3.functional.fbo.invalidate.whole.unbind_blit_color Reviewed-by: Rob Clark <[email protected]> Signed-off-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: Support some depth/stencil blits on blitterKristian H. Kristensen2019-02-111-1/+84
| | | | | | | | | | | | | | | | | | We can rewrite almost all depth stencil blits to various red-only blits. The exception is depth-only or stencil-only blits into z24s8 combined depth stencil buffer. We can fall back for depth-only, but stencil-only remains broken. Fixes dEQP-GLES3.functional.fbo.blit.depth_stencil.depth24_stencil8_basic dEQP-GLES3.functional.fbo.blit.depth_stencil.depth24_stencil8_scale dEQP-GLES3.functional.fbo.blit.depth_stencil.depth32f_stencil8_basic dEQP-GLES3.functional.fbo.blit.depth_stencil.depth32f_stencil8_scale dEQP-GLES3.functional.fbo.blit.depth_stencil.depth32f_stencil8_stencil_only Reviewed-by: Rob Clark <[email protected]> Signed-off-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: Move blit check so as to restore commentKristian H. Kristensen2019-02-111-4/+4
| | | | | | | | | | | | | | The explanation for the compressed format check is broken across two comments: /* We can blit if both or neither formats are compressed formats... */ /* ... but only if they're the same compression format. */ but the ok_format() checks were inserted between, breaking up the flow of the sentence. Reviewed-by: Rob Clark <[email protected]> Signed-off-by: Kristian H. Kristensen <[email protected]>
* freedreno: Don't tell the blitter what it can't doKristian H. Kristensen2019-02-111-2/+4
| | | | | | | | Call ctx->blit() and let it reject blits it can't do instead of giving up on stencil blits and blits u_blitter can't do. Reviewed-by: Rob Clark <[email protected]> Signed-off-by: Kristian H. Kristensen <[email protected]>
* freedreno: Consolidate u_blitter functions in freedreno_blitter.cKristian H. Kristensen2019-02-115-149/+153
| | | | | Reviewed-by: Rob Clark <[email protected]> Signed-off-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: Combine emit_blit and fd6_blitKristian H. Kristensen2019-02-111-12/+5
| | | | | Reviewed-by: Rob Clark <[email protected]> Signed-off-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: Use the right resource for separate stencil strideKristian H. Kristensen2019-02-111-1/+1
| | | | | Reviewed-by: Rob Clark <[email protected]> Signed-off-by: Kristian H. Kristensen <[email protected]>
* freedreno: Log number of draw for sysmem passesKristian H. Kristensen2019-02-111-2/+3
| | | | | Reviewed-by: Rob Clark <[email protected]> Signed-off-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: Drop render condition check in blitterKristian H. Kristensen2019-02-111-5/+16
| | | | | | | | | | | | | | | | | | | | | | | | We already check earlier in the call chain in fd_blit(). glBlitFramebuffer always sets render_condition_enable and thus we would never try the blitter path for that. Now that we get all of dEQP-GLES3.functional.fbo.blit.conversion.* down this path, it turs out that the fail_if(info->mask != util_format_get_mask(info->src.format)); fail_if(info->mask != util_format_get_mask(info->dst.format)); conditions weren't accurate. util_format_get_mask() returns PIPE_MASK_RGBA for any format with any color channels, while info->mask is the exact set of channels to blit. So we reject things we could blit - for example, PIPE_FORMAT_R16G16_FLOAT where info->mask is RG while util_format_get_mask() returns RGBA - and accept things we can't. It turns out that the blitter is happy to blit different number of channels, but fails to blit formats with different numerical formats and srgb formats. Signed-off-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/a6xx: regen headersKristian H. Kristensen2019-02-111-25/+56
| | | | | | | | Update for a6xx.xml.h to incorporate a few new bits and changes to blit src rect coordinate types. Signed-off-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* st/va/vp9: set max reference as default of VP9 reference numberLeo Liu2019-02-111-1/+6
| | | | | | | | If there is no information about number of render targets Signed-off-by: Leo Liu <[email protected]> Reviewed-by: Boyuan Zhang <[email protected]> Cc: 19.0 <[email protected]>
* st/va: fix the incorrect max profiles reportLeo Liu2019-02-112-2/+3
| | | | | | | | | | | Add "PIPE_VIDEO_PROFILE_MAX" to enum, so it will make sure here will be correct when adding more profiles in the future. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109107 Signed-off-by: Leo Liu <[email protected]> Reviewed-by: Boyuan Zhang <[email protected]> Cc: 19.0 <[email protected]>
* st/va:Add support for indirect manner by returning ↵Guttula, Suresh2019-02-111-2/+5
| | | | | | | | | | | | | | | | | | VA_STATUS_ERROR_OPERATION_FAILED Based on VA Spec,DeriveImage() returns VA_STATUS_ERROR_OPERATION_FAILED if driver dont have support for internal surface formats.Currently vaDeriveImage() failed for non-contiguous planes and operation failed error string is required to support indirect manner i.e. vaCreateImage()+vaPutImage() incase vaDeriveImage() failed with VA_STATUS_ERROR_OPERATION_FAILED. This patch will notify to the client as operation failed with proper error sting,so that client will fallback to vaCreateImage()+vaPutImage(). v2: updated commit message based on VA spec. Signed-off-by: suresh guttula <[email protected]> Reviewed-by: Leo Liu <[email protected]>
* winsys/amdgpu: cs_check_space sets the minimum IB size for future IBsMarek Olšák2019-02-112-2/+23
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* winsys/amdgpu: clean up IB buffer size computationMarek Olšák2019-02-111-8/+4
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* winsys/amdgpu: remove occurence of INDIRECT_BUFFER_CONSTMarek Olšák2019-02-111-2/+1
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>