summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* main: memcpy larger chunks in _mesa_propagate_uniforms_to_driver_storageNils Wallménius2016-07-251-6/+23
| | | | | | | | | | | | | | | | When possible, do the memcpy on larger blocks. This reduces cycles spent in _mesa_propagate_uniforms_to_driver_storage from 1.51 % to 0.62% according to perf during the Unigine Heaven benchmark. It did not affect the framerate of the benchmark. The system used for testing was an i5 6600K with a Radeon R9 380. Piglit hangs randomly on this system both with and without the patch so i could not make a comparison. v2: fixed whitespace Signed-off-by: Nils Wallménius <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* st/va: enable h264 VAAPI encodeBoyuan Zhang2016-07-251-5/+1
| | | | | | Enable H.264 VAAPI encoding through config. Currently only H.264 baseline is supported. Encode entrypoint is not accepted by driver. Signed-off-by: Boyuan Zhang <[email protected]>
* st/va: add function to handle misc param type frame rateBoyuan Zhang2016-07-251-5/+19
| | | | | | Frame rate can be passed to driver either through VAEncSequenceParameterBufferType or VAEncMiscParameterTypeFrameRate. Previous code only implement the former one, which is used by Gstreamer-Vaapi. Now adding implementation for VAEncMiscParameterTypeFrameRate. Also adding default frame rate as 30 just in case application never provides frame rate information to driver. Signed-off-by: Boyuan Zhang <[email protected]>
* st/va: add enviromental variable to disable interlaceBoyuan Zhang2016-07-251-0/+4
| | | | | | Add environmental variable to disable interlace mode. At VAAPI decoding stage, driver can not distinguish b/w pure decoding case and transcoding case. And since interlace encoding is not supported, we have to disable interlace for transcoding case. The temporary solution is to use enviromental variable to disable interlace mode. Signed-off-by: Boyuan Zhang <[email protected]>
* st/va: add preset values for VAAPI encodeBoyuan Zhang2016-07-251-0/+27
| | | | | | Add some hardcoded values hardware needs mainly for rate control purpose. With previously hardcoded values for OMX, the rate control result is not correct. This change fixed the rate control result by setting correct values for Vaapi. Signed-off-by: Boyuan Zhang <[email protected]>
* st/va: add functions for VAAPI encodeBoyuan Zhang2016-07-253-2/+178
| | | | | | Add necessary functions/changes for VAAPI encoding to buffer and picture. These changes will allow driver to handle all Vaapi encode related operations. This patch doesn't change the Vaapi decode behaviour. Signed-off-by: Boyuan Zhang <[email protected]>
* st/va: get rate control method from configattrib v2Boyuan Zhang2016-07-253-0/+15
| | | | | | | | | | | Rate control method is passed from app to driver through config attrib list. That is why we need to store this rate control method to config. And later on, we will pass this value to context->desc.h264enc.rate_ctrl.rate_ctrl_method. v2 (chk): fix broken build and commit message Signed-off-by: Boyuan Zhang <[email protected]> Signed-off-by: Christian König <[email protected]>
* st/va: add conversion for yv12 to nv12in putimage v2Boyuan Zhang2016-07-251-7/+27
| | | | | | | | | | | | For putimage call, if image format is yv12 (or IYUV with U V field swap) and surface format is nv12, then we need to convert yv12 to nv12 and then copy the converted data from image to surface. We can't use the existing logic where surface is destroyed and re-created with yv12 format. v2 (chk): fix some compiler warnings and commit message Signed-off-by: Boyuan Zhang <[email protected]> Signed-off-by: Christian König <[email protected]>
* vl/util: add copy func for yv12image to nv12surface v2Boyuan Zhang2016-07-251-0/+37
| | | | | | | | | | | | Add function to copy from yv12 image to nv12 surface for VAAPI putimage call. We need this function in VaPutImage call where copying from yv12 image to nv12 surface for encoding. Existing function can't be used because it only work for copying from yv12 surface to nv12 image in Vaapi. v2: cleanup variable types and commit message Signed-off-by: Boyuan Zhang <[email protected]> Signed-off-by: Christian König <[email protected]>
* st/va: add encode entrypoint v2Boyuan Zhang2016-07-254-39/+150
| | | | | | | | | | | | | | | | | VAAPI passes PIPE_VIDEO_ENTRYPOINT_ENCODE as entry point for encoding case. We will save this encode entry point in config. config_id was used as profile previously. Now, config has both profile and entrypoint field, and config_id is used to get the config object. Later on, we pass this entrypoint to context->templat.entrypoint instead of always hardcoded to PIPE_VIDEO_ENTRYPOINT_BITSTREAM for decoding case previously. Encode entrypoint is not accepted by driver until we enable Vaapi encode in later patch. v2 (chk): fix commit message to match 80 chars, use switch instead of ifs, fix memory leaks in the error path, implement vlVaQueryConfigEntrypoints as well, drop VAEntrypointEncPicture (only used for JPEG). Signed-off-by: Boyuan Zhang <[email protected]> Signed-off-by: Christian König <[email protected]>
* nvc0: upload sample locations on GM20xSamuel Pitoiset2016-07-243-5/+31
| | | | | | | | This fixes a bunch of multisample piglit tests on GM206, like bin/arb_texture_multisample-texelfetch 2 -auto -fbo Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* freedreno/a4xx: time-elapsed query should be active for clearsRob Clark2016-07-241-1/+1
| | | | Signed-off-by: Rob Clark <[email protected]>
* nvc0/ir: fix up an assertion in emitUADD()Samuel Pitoiset2016-07-241-4/+3
| | | | | | | | It's illegal to have neg modifiers on both sources for OP_ADD, and it's illegal to have OP_SUB with just src0 neg. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: fix wrong indentation in nvc0_validate_fb()Samuel Pitoiset2016-07-231-141/+141
| | | | | | Trivial. Signed-off-by: Samuel Pitoiset <[email protected]>
* glsl: reuse main extension table to appropriately restrict extensionsIlia Mirkin2016-07-2314-354/+268
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously we were only restricting based on ES/non-ES-ness and whether the overall enable bit had been flipped on. However we have been adding more fine-grained restrictions, such as based on compat profiles, as well as specific ES versions. Most of the time this doesn't matter, but it can create awkward situations and duplication of logic. Here we separate the main extension table into a separate object file, linked to the glsl compiler, which makes use of it with a custom function which takes the ES-ness of the shader into account (thus allowing desktop shaders to properly use ES extensions that would otherwise have been disallowed.) We can also now use this logic to generate #define's for all supported extensions automatically, removing the duplicate (and often inaccurate) list in glcpp. The effect of this change should be nil in most cases. However in some situations, extensions like GL_ARB_gpu_shader5 which were formerly available in compat contexts on the GLSL side of things will now become inaccessible. This regresses two ES CTS tests: ES3-CTS.shaders.shader_integer_mix.define ES31-CTS.shader_integer_mix.define however that is due to them using #version 100 instead of 300 es. As the extension is only defined for ES3, I believe this is the correct behavior. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> (v2) v2 -> v3: integrate glcpp defines into the same mechanism
* freedreno/a4xx: timestamp queriesRob Clark2016-07-233-1/+34
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: hw timestamp supportRob Clark2016-07-233-3/+16
| | | | | | If the kernel supports it, use hw counter for timestamps. Signed-off-by: Rob Clark <[email protected]>
* freedreno: prep work for timestamp queriesRob Clark2016-07-233-6/+10
| | | | | | | | | We need "NULL" state to be a valid bit in the bitmask, because timestamp queries are not restricted to draw/etc stages (ie. the only commands to submit may just be to read the timestamp). And just because there are no draws, isn't a reason to skip the flush and return zero. Signed-off-by: Rob Clark <[email protected]>
* radeonsi: ensure sample locations are set for line and polygon smoothingNicolai Hähnle2016-07-231-2/+1
| | | | | | | Since commit d938b8c, the sample locations are no longer set unconditionally, so we need to set the atom to dirty on all chips, not just Polaris. Cc: 12.0 <[email protected]>
* radeonsi: fix Polaris MSAA regressionNicolai Hähnle2016-07-232-15/+20
| | | | | | | | | | | | | The regression was introduced by commit d938b8c. The problem here is that in order to use the small primitive filter, we need to explicitly set the sample locations to 0. But the DB doesn't properly process the change of sample locations without a flush, and so we can end up with incorrect Z values. Instead of doing a flush, just disable the small primitive filter when MSAA is force-disabled. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96908 Cc: 12.0 <[email protected]>
* freedreno/ir3: Add missing braces in initializer[email protected]2016-07-231-1/+1
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a2xx: silence missing case 'SHADER_COMPUTE' warning (v2)[email protected]2016-07-231-0/+2
| | | | | | | v2: no need for break after an unreachable (Matt Turner) Signed-off-by: Francesco Ansanelli <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* radeonsi: implement buffer_subdata without indirect callsMarek Olšák2016-07-235-5/+41
| | | | | | There is less noise in CPU profile data now. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/util: don't modify usage in pipe_buffer_writeMarek Olšák2016-07-232-9/+7
| | | | | | All drivers were already doing it except virgl. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: split transfer_inline_write into buffer and texture callbacksMarek Olšák2016-07-2357-389/+383
| | | | | | | | | | | | | | | | | | | | | | | | | | to reduce the call indirections with u_resource_vtbl. The worst call tree you could get was: - u_transfer_inline_write_vtbl - u_default_transfer_inline_write - u_transfer_map_vtbl - driver_transfer_map - u_transfer_unmap_vtbl - driver_transfer_unmap That's 6 indirect calls. Some drivers only had 5. The goal is to have 1 indirect call for drivers that care. The resource type can be determined statically at most call sites. The new interface is: pipe_context::buffer_subdata(ctx, resource, usage, offset, size, data) pipe_context::texture_subdata(ctx, resource, level, usage, box, data, stride, layer_stride) v2: fix whitespace, correct ilo's behavior Reviewed-by: Nicolai Hähnle <[email protected]> Acked-by: Roland Scheidegger <[email protected]>
* nir: Lower interp_var_at_* like a normal load_var for flat inputs.Kenneth Graunke2016-07-221-0/+4
| | | | | | | | | | "flat centroid" and "flat sample" both just mean "flat", so we should ignore interpolateAtCentroid/Sample and just return the flat value. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97032 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* mesa: Don't call GenerateMipmap if Width or Height == 0.Kenneth Graunke2016-07-221-0/+5
| | | | | | | | | | | | | | | | | One of the WebGL 2.0 conformance tests is trying to call glGenerateMipmaps with a width and height of 0. With the meta implementation, this generates a "framebuffer attachment incomplete" status, and falls back to the CPU path, calling MapTextureImage. Except that there's no actual texture to map, and we assert fail. There's no work to do in this case. The test expects it to succeed, so just return early with no error and avoid hassling the driver. Cc: [email protected] Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96911 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* anv/pipeline: Set up point coord enablesJason Ekstrand2016-07-221-0/+5
| | | | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Tested-by: Lionel Landwerlin <[email protected]> Cc: "12.0" <[email protected]>
* spirv/nir: Add support for ImageQuerySamplesJason Ekstrand2016-07-221-0/+3
| | | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "12.0" <[email protected]>
* spirv/nir: Handle texture projectorsJason Ekstrand2016-07-221-0/+15
| | | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "12.0" <[email protected]>
* nir/spirv: Refactor coordinate handling in handle_textureJason Ekstrand2016-07-221-29/+28
| | | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "12.0" <[email protected]>
* spirv/nir: Refactor type handling in handle_textureJason Ekstrand2016-07-221-5/+8
| | | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "12.0" <[email protected]>
* spirv/nir: Move opcode selection higher up in handle_textureJason Ekstrand2016-07-221-48/+48
| | | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "12.0" <[email protected]>
* anv/image: Assert that the image format is actually supportedJason Ekstrand2016-07-221-2/+5
| | | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "12.0" <[email protected]>
* spirv/nir: Don't increment coord_components for array lod queriesJason Ekstrand2016-07-221-1/+1
| | | | | | | | | For lod query instructions, we really don't care whether or not the sampler is an array type because that doesn't factor into the LOD. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "12.0" <[email protected]>
* i965: Get rid of the do_lower_unnormalized_offsets passJason Ekstrand2016-07-224-109/+0
| | | | | | | | | We can do this in NIR now. No need to keep a GLSL pass lying around for it. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "12.0" <[email protected]>
* i965/nir: Enable NIR lowering of txf and rect offsetsJason Ekstrand2016-07-221-0/+2
| | | | | | | | | | | | | | This fixes the following piglit tests on gen6+: tex-miplevel-selection textureProjGradOffset 2DRect tex-miplevel-selection textureGradOffset 2DRect tex-miplevel-selection textureGradOffset 2DRectShadow tex-miplevel-selection textureProjGradOffset 2DRect_ProjVec4 tex-miplevel-selection textureProjGradOffset 2DRectShadow Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "12.0" <[email protected]>
* nir/lower_tex: Add support for lowering coordinate offsetsJason Ekstrand2016-07-222-0/+64
| | | | | | | | | | On i965, we can't support coordinate offsets for texelFetch or rectangle textures. Previously, we were doing this with a GLSL pass but we need to do it in NIR if we want those workarounds for SPIR-V. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "12.0" <[email protected]>
* nir/lower_tex: Add some helpers for working with tex sourcesJason Ekstrand2016-07-221-16/+30
| | | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "12.0" <[email protected]>
* nir: Add a helper for determining the type of a texture sourceJason Ekstrand2016-07-221-0/+44
| | | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "12.0" <[email protected]>
* anv/pipeline: Set binding_table.gather_texture_startJason Ekstrand2016-07-221-0/+1
| | | | | | | | This should get texture gather working on gen8+ and mostly working on gen7. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "12.0" <[email protected]>
* spirv/nir: Properly handle gather componentsJason Ekstrand2016-07-221-1/+11
| | | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "12.0" <[email protected]>
* spirv/nir: Add support for shadow samplers that return vec4Jason Ekstrand2016-07-221-1/+2
| | | | | | | | | | While SPIR-V technically doesn't support "old style" shadow, the shadow-compare gather instruction does return a vec4 so we need to be able to set the old_style_shadow bit in NIR. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "12.0" <[email protected]>
* spirv/nir: Fix some texture opcode assertsJason Ekstrand2016-07-221-2/+2
| | | | | | | | | We can't get an lod with txf_ms and SPIR-V considers textureGrad to be an explicit-LOD texturing instruction. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "12.0" <[email protected]>
* nv50/ir: allow to swap sources for OP_SUBSamuel Pitoiset2016-07-221-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows the load-propagation pass to swap the sources in presence of immediate values. Maxwell (GM107): total instructions in shared programs :1928187 -> 1927634 (-0.03%) total gprs used in shared programs :330741 -> 330154 (-0.18%) total local used in shared programs :28032 -> 28032 (0.00%) local gpr inst bytes helped 0 271 425 425 hurt 0 0 194 194 Fermi (GF114): total instructions in shared programs :2334474 -> 2333829 (-0.03%) total gprs used in shared programs :380934 -> 380215 (-0.19%) total local used in shared programs :33304 -> 33264 (-0.12%) local gpr inst bytes helped 5 314 521 521 hurt 0 4 195 195 No regressions on GM107 and GF114 with full piglit. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gallium/radeon: make deferred flushes asynchronousMarek Olšák2016-07-221-0/+2
| | | | Reviewed-by: Edward O'Callaghan <[email protected]>
* gallium: add PIPE_FLUSH_DEFERREDMarek Olšák2016-07-223-2/+13
| | | | | | | | | | | | | There are 2 uses: - Asynchronous flushing for multithreaded drivers. - Return a fence without flushing (mid-command-buffer fence). The driver can defer flushing until fence_finish is called. This is required to make Bioshock Infinite faster, which creates 1000 fences (flushes) per frame. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* gallium/os: use CLOCK_MONOTONIC for sleeps (v2)Marek Olšák2016-07-222-6/+14
| | | | | | v2: handle EINTR, remove backslashes Reviewed-by: Eric Engestrom <[email protected]>
* mapi: fix typo in macro nameEric Engestrom2016-07-223-3/+3
| | | | | | | Fixes: 5ec140c17b54c2592009 ("mapi: Massage code to allow clang to compile.") Reported-by: Alexandre Demers <[email protected]> Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Eric Engestrom <[email protected]>
* docs: Put swr back on the GL_ARB_texture_buffer_object_rgb32 list.Kenneth Graunke2016-07-221-1/+1
| | | | | Looks like this was lost when resolving merge conflicts in commit d1fbd4cdb1bdb8041362a8e5f05833c43a39c9a6.