summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* vc4: Rewrite T image handling based on calling the LT handler.Eric Anholt2017-01-051-34/+75
| | | | | | | | | | | | | The T images are composed of effectively swizzled-around blocks of LT (4x4 utile) images, so we can reduce the t_utile_address() calls by 16x by calling into the simpler LT loop. This also adds support for calling down with non-utile-aligned coordinates, which will be part of lifting the utile alignment requirement on our callers and avoiding the RMW on non-utile-aligned stores. Improves 1024x1024 TexSubImage by 2.55014% +/- 1.18584% (n=46) Improves 1024x1024 GetTexImage by 2.242% +/- 0.880954% (n=32)
* vc4: Move the utile_width/height functions to header inlines.Eric Anholt2017-01-052-37/+36
| | | | | | I want these inlined in the callers, particularly with the tiling changes coming up, but we're not building with lto so some caller would suffer.
* vc4: Make the load/store utile functions static.Eric Anholt2017-01-052-4/+2
| | | | | They don't have any other callers outside of this file, and I'm hoping they get inlined soon.
* vc4: Simplify the load/store utile functions.Eric Anholt2017-01-051-10/+22
| | | | | | | | | They now have less of a dependency on the cpp, and don't have to do a divide. Hacking up mesa-demos teximage to do only one subtest and not draw points, I saw 1024x1024 glTexSubImage2D() improve by 4.86939% +/- 1.40408% (n=30) and glGetTexImage() by 2.18978% +/- 0.140268% (n=5).
* vc4: Reuse a list function to simplify bufmgr code.Eric Anholt2017-01-051-11/+2
|
* vc4: Flush the job early if we're referencing too many BOs.Eric Anholt2017-01-053-0/+16
| | | | | | | | | | | If we get up toward 256MB (or whatever the CMA area size is), VC4_GEM_CREATE will start throwing errors. Even if we don't trigger that, when we flush the kernel's BO allocation for the CLs or bin memory may end up throwing an error, at which point our job won't get rendered at all. Just flush early (half of maximum CMA size) so that hopefully we never get to that point.
* st/mesa/glsl: move SamplerTargets to gl_programTimothy Arceri2017-01-066-13/+16
| | | | | | | | This will help allow us to simplify the handling of samplers by storing them in a single location rather than duplicating them in both gl_linked_shader and gl_program. Reviewed-by: Eric Anholt <[email protected]>
* st/mesa/glsl: set SamplersUsed directly in gl_programTimothy Arceri2017-01-065-6/+3
| | | | Reviewed-by: Eric Anholt <[email protected]>
* mesa/glsl: set sampler units directly in gl_programTimothy Arceri2017-01-064-17/+7
| | | | | | | Now that we create gl_program earlier there is no need to mess about copying things to gl_linked_shader then to gl_program. Reviewed-by: Eric Anholt <[email protected]>
* mesa: simplify sampler setting codeTimothy Arceri2017-01-061-22/+11
| | | | | | | | There is no need to loop over active samplers the code above this would have already exited if the sampler was inactive, or errored if the count was larger than the uniforms array size. Reviewed-by: Eric Anholt <[email protected]>
* mesa/glsl: set num_textures per stage directly in shader_infoTimothy Arceri2017-01-066-6/+5
| | | | Reviewed-by: Eric Anholt <[email protected]>
* mesa: make _CurrentFragmentProgram a gl_program struct pointerTimothy Arceri2017-01-066-30/+22
| | | | | | | | Making this point to a gl_program struct rather than a gl_shader_program struct will allow use to later also make the CurrentProgram array hold gl_program structs which in turn will allow for code simpilifcation. Reviewed-by: Eric Anholt <[email protected]>
* i965: stop passing gl_shader_program to the precompile and codegen functionsTimothy Arceri2017-01-0612-87/+31
| | | | | | | | We no longer need it. While we are at it we mark the vs, gs, and wm codegen functions as static. Reviewed-by: Eric Anholt <[email protected]>
* mesa/glsl: remove hack to reset sampler units to zeroTimothy Arceri2017-01-063-20/+16
| | | | | | | | | | Now that we have the is_arb_asm flag we can just skip the initialisation. V2: remove hack from standalone compiler where it was never needed since it only compiles glsl shaders. Reviewed-by: Eric Anholt <[email protected]>
* i965: make use of new is_arb_asm flagTimothy Arceri2017-01-062-13/+11
| | | | Reviewed-by: Eric Anholt <[email protected]>
* st/mesa/glsl: add new is_arb_asm flag in gl_programTimothy Arceri2017-01-0613-34/+45
| | | | | | | | | | | | | | | | Set the flag via the _mesa_init_gl_program() and NewProgram() helpers. In i965 we currently check for the existance of gl_shader_program to decide if this is an ARB assembly style program or not. Adding a flag makes the code clearer and will help removes a dependency on gl_shader_program in the i965 codegen functions. Also this will allow use to skip initialising sampler units for linked shaders, we currently memset it to zero again during linking. Reviewed-by: Eric Anholt <[email protected]>
* i965: pass gl_program directly to brw_compile_tes()Timothy Arceri2017-01-063-6/+4
| | | | | | This is the only thing we use from gl_shader_program so pass it directly. Reviewed-by: Lionel Landwerlin <[email protected]>
* i965: stop passing gl_shader_program to brw_nir_setup_glsl_uniforms()Timothy Arceri2017-01-068-18/+13
| | | | | | | We can now just get the data needed from the gl_shader_program_data pointer in gl_program. Reviewed-by: Lionel Landwerlin <[email protected]>
* i965: pass gl_program to brw_upload_ubo_surfaces()Timothy Arceri2017-01-066-22/+20
| | | | | | There is no need to pass gl_linked_shader anymore. Reviewed-by: Lionel Landwerlin <[email protected]>
* i965: stop passing gl_shader_program to ↵Timothy Arceri2017-01-068-32/+13
| | | | | | | | | brw_assign_common_binding_table_offsets() We now get everything we need directly from gl_program so there is no need for this. Reviewed-by: Lionel Landwerlin <[email protected]>
* st/mesa/glsl/i965: move ShaderStorageBlocks to gl_programTimothy Arceri2017-01-066-10/+11
| | | | | | | | | | | | Having it here rather than in gl_linked_shader allows us to simplify the code. Also it is error prone to depend on the gl_linked_shader for programs in current use because a failed linking attempt will free infomation about the current program. In i965 we could be trying to recompile a shader variant but may have lost some required fields. Reviewed-by: Lionel Landwerlin <[email protected]>
* st/mesa/glsl/i965: set num_ssbos directly in shader_infoTimothy Arceri2017-01-069-23/+24
| | | | | | | Here we also remove the duplicate field in gl_linked_shader and always get the value from shader_info instead. Reviewed-by: Lionel Landwerlin <[email protected]>
* st/mesa/glsl/i965: move per stage UniformBlocks to gl_programTimothy Arceri2017-01-067-21/+19
| | | | | | | This will help allow us to store pointers to gl_program structs in the CurrentProgram array resulting in a bunch of code simplifications. Reviewed-by: Lionel Landwerlin <[email protected]>
* st/mesa/glsl/i965: set num_ubos directly in shader_infoTimothy Arceri2017-01-069-21/+17
| | | | | | | This also removes the duplicate field in gl_linked_shader, and gets num_ubos from shader_info instead. Reviewed-by: Lionel Landwerlin <[email protected]>
* st/mesa/glsl/i965: move ImageUnits and ImageAccess fields to gl_programTimothy Arceri2017-01-0612-67/+50
| | | | | | | | | | | | | | | Having it here rather than in gl_linked_shader allows us to simplify the code. Also it is error prone to depend on the gl_linked_shader for programs in current use because a failed linking attempt will free infomation about the current program. In i965 we could be trying to recompile a shader variant but may have lost some required fields. We drop the memset on ImageUnits because gl_program is already created using rzalloc(). Reviewed-by: Lionel Landwerlin <[email protected]>
* i965: get InfoLog and LinkStatus via the pointer in gl_programTimothy Arceri2017-01-061-4/+4
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* i965: get shared_size from shader_info rather than gl_shader_programTimothy Arceri2017-01-061-2/+2
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* i965: stop depending on gl_shader_program for brw_compute_vue_map() paramsTimothy Arceri2017-01-061-1/+1
| | | | | | | | This removes another dependency on gl_shader_program from the codegen functions, this will help allow us to use gl_program for the CurrentProgram array rather than gl_shader_program. Reviewed-by: Lionel Landwerlin <[email protected]>
* i965: pass gl_program to the brw_*_debug_recompile() functionsTimothy Arceri2017-01-067-138/+125
| | | | | | | | | | | | | | Rather then passing gl_shader_program. The only field use was Name which is the same as the Id field in gl_program. For wm and vs we also make the functions static and move them before the codegen functions. This change reduces the codegen functions dependency on gl_shader_program. Reviewed-by: Lionel Landwerlin <[email protected]>
* gallivm: (trivial) fix typo bug with small AoS format unpackingRoland Scheidegger2017-01-061-1/+1
| | | | | | | Fix typo using wrong (uninitialized) build context introduced by 4634cb5921b985f04f2daf00cda2d28036143bd3. (This only affects very rare small packed formats which have a PIPE_SWIZZLE_0 channel, such as r4a4, which is never used by mesa/st. Nevertheless it broke lp_test_format.)
* gallivm: implement aos unpack (to unorm8) for small unorm formatsRoland Scheidegger2017-01-052-17/+155
| | | | | | | | | | | | | | | | | | | Using bit replication. This path now resembles something which might make sense. (The logic was mostly copied from llvmpipe fs backend.) I am not convinced though it is actually faster than SoA sampling (actually I'm quite certain it's always a loss with AVX). With SoA it's just shift/mask/cvt/mul for getting the colors, whereas there's still roughly 3 shifts, 3 or/and per channel for AoS (i.e. for SoA it's exactly the same as it would be for a rgba8 format, whereas the extra effort for AoS is significant). The filtering might still be faster (albeit with FMA the instruction count gets down quite a bit there on the SoA float filtering path on new cpus). And those small unorm formats often don't have an alpha channel (which makes things worse relatively for AoS path). (This also fixes a trivial bug in the llvmpipe fs code this was derived from, albeit it was only relevant for 4-bit channels.) Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: optimize lp_build_unpack_arith_rgba_aos slightlyRoland Scheidegger2017-01-051-19/+97
| | | | | | | | | | | | | | | | | This code uses a vector shift which has to be emulated on x86 unless there's AVX2. Luckily in some cases we can actually avoid the shift altogether, so do that. Also make sure we hit the fast lp_build_conv() path when applicable, albeit that's quite the hack... That said, this path is taken for AoS sampling for small unorm (smaller than rgba8) formats, and it is completely hopeless even with those changes, with or without AVX. (Probably should have some code similar to the one in the llvmpipe fs backend code, using bit replication to extend to rgba8888 - rounding is not quite 100% accurate but if it's good enough there it should be here as well.) Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: use 2 srcs for 32->16bit conversions in lp_bld_conv_autoRoland Scheidegger2017-01-051-2/+19
| | | | | | | | | | | | | | | | | If we only feed one source vector at a time, we cannot use pack intrinsics (as we only have a 64bit destination dst vector). lp_bld_conv_auto is specifically designed to alter the length and number of destination vectors, so this works just fine (if we use single source vectors at a time, afterwards we immediately reassemble the vectors). For AVX though this isn't really possible, since we expect 128bit output already for a single 256bit input. (One day we should handle AVX2 which again would need multiple inputs, however there's the problem that we get different ordered output there and we don't want to reorder, so would need to be able to tell build_conv to handle upper and lower halfs independently.) A similar strategy would probably work for 32->8bit too (if it doesn't hit the special case) but I'm going to try something different for that... Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: (trivial) minimally simplify mask constructionRoland Scheidegger2017-01-053-17/+53
| | | | | | | | | | | | | | | | simd instruction sets usually have comparisons for equal, not unequal. So use a different comparison against the mask itself - which also means we don't need a all-zero as well as a all-one (for the pxor) reg. Also add code to avoid scalar expansion of i1 values which we definitely shouldn't do. There's problems with this though with llvm select interaction, so it's disabled (basically using llvm select instead of intrinsics may still produce atrocious code, even in cases where we figured it should not, albeit I think this could probably be fixed with some better selection of optimization passes, but I have zero idea there really). Reviewed-by: Jose Fonseca <[email protected]>
* anv: fix multiple creation with internal failureLionel Landwerlin2017-01-051-18/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The specification section 9.4 says : When an application attempts to create many pipelines in a single command, it is possible that some subset may fail creation. In that case, the corresponding entries in the pPipelines output array will be filled with VK_NULL_HANDLE values. If any pipeline fails creation (for example, due to out of memory errors), the vkCreate*Pipelines commands will return an error code. The implementation will attempt to create all pipelines, and only return VK_NULL_HANDLE values for those that actually failed. Fixes : dEQP-VK.api.object_management.alloc_callback_fail_multiple.graphics_pipeline dEQP-VK.api.object_management.alloc_callback_fail_multiple.compute_pipeline v2: C is hard let's go shopping (Lionel) v3: Remove unnecessary condition in for loops (Lionel) v4: Document why we return on first failure (Eduardo) Move i declaration inside for() (Eduardo) v5: Move array cleanup out of loop (Jason) Signed-off-by: Lionel Landwerlin <[email protected]>
* swr: [rasterizer core/common/jitter] gl_double supportTim Rowley2017-01-059-33/+341
| | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99214 Reviewed-by: Bruce Cherniak <[email protected]>
* dri3: Fix MakeCurrent without a default framebufferFredrik Höglund2017-01-051-4/+10
| | | | | | | | | | In OpenGL 3.0 and later it is legal to make a context current without a default framebuffer. This has been broken since DRI3 support was introduced. Cc: "13.0 12.0" <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: turn SDMA IBs into de-facto preambles of GFX IBsMarek Olšák2017-01-052-6/+18
| | | | | | | | | | Draw calls no longer flush SDMA IBs. r600_need_dma_space is responsible for synchronizing execution between both IBs. Initial buffer clears and fast clears will stay unflushed in the SDMA IB (up to 64 MB) as long as the GFX IB isn't flushed either. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: implement SDMA-based buffer clearing for SIMarek Olšák2017-01-052-1/+41
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: do all math in bytes in SI DMA codeMarek Olšák2017-01-052-19/+21
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: prevent SDMA stalls by detecting RAW hazards in need_dma_spaceMarek Olšák2017-01-058-36/+27
| | | | | | | | Call r600_dma_emit_wait_idle only when there is a possibility of a read-after-write hazard. Buffers not yet used by the SDMA IB don't have to wait. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: move unrelated code from dma_emit_wait_idle to need_dma_spaceMarek Olšák2017-01-051-18/+15
| | | | | | | | r600_dma_emit_wait_idle is going away in its current form. The only difference is that the moved code is executed before DMA calls instead of after them. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: inline cik_sdma_do_copy_bufferMarek Olšák2017-01-051-28/+16
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: also wait for SDMA in the clear_buffer CPU fallbackMarek Olšák2017-01-051-3/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: simplify r600_resource typecasts in si_clear_bufferMarek Olšák2017-01-051-5/+5
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: always use SDMA for big buffer clears and first buffer usesMarek Olšák2017-01-051-0/+20
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use SDMA in rvid_buffer_clear on CIK-VIMarek Olšák2017-01-051-2/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use SDMA for initial clearing of DCC/CMASK/HTILE on CIK-VIMarek Olšák2017-01-053-8/+6
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: implement SDMA-based buffer clearing for CIK-VIMarek Olšák2017-01-053-0/+54
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/hud: increase the vertex buffer size for textMarek Olšák2017-01-051-1/+1
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>