summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* vc4: Remove qir_inst4().Eric Anholt2016-11-292-25/+0
| | | | | This was used originally for unorm4x8 packs, but we now represent those as a series of packed movs.
* swr: [rasterizer memory] only clear up to the LOD sizeIlia Mirkin2016-11-281-2/+8
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: [rasterizer memory] hook up stencil clears for ClearTileIlia Mirkin2016-11-281-5/+8
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: [rasterizer memory] add support for clearing Z32F_X32 and Z16Ilia Mirkin2016-11-281-0/+2
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: don't clear all dirty bits when changing so targetsIlia Mirkin2016-11-281-1/+1
| | | | | | | | | | Among other things, blits would clear existing SO targets which would cause a bunch of updates from u_blitter to be missed. Fixes fbo-scissor-blit fbo, probably among many others. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer core] fix typo in scissor tile-alignment logicIlia Mirkin2016-11-281-1/+1
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* st/omx/dec/h264: consider POC as signed instead of unsignedChandu Babu Namburu2016-11-281-3/+3
| | | | | | picture order count can be a negative value Reviewed-by: Christian König <[email protected]>
* freedreno: fix slice size for imported buffersRob Clark2016-11-271-0/+1
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: make _emit_const() staticRob Clark2016-11-272-5/+1
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a4xx: make _emit_const() staticRob Clark2016-11-273-6/+2
| | | | Signed-off-by: Rob Clark <[email protected]>
* gm107/ir: optimize 32-bit CONST load to movSamuel Pitoiset2016-11-262-0/+17
| | | | | | | | | This is not allowed for indirect accesses because the source GPR might be erased by a subsequent instruction (WaR hazard) if we don't emit a read dep bar. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gm107/ir: do not combine CONST loadsSamuel Pitoiset2016-11-261-2/+7
| | | | | | | | | | | | | | This will allow to use MOV instead of LD. The main advantage is that MOV doesn't require a read dependency barrier while LD does, and so this will both reduce barriers pressure and the number of stall counts needed to read data from constant memory. This is currently only for user uniform accesses. I should do something similar when loading from the driver constant buffer but it seems like a bit tricky to handle for now. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* clover: Restore support for LLVM <= 3.9.Vedran Miletić2016-11-242-6/+21
| | | | | | | | | | | | | | | | | | The commit 8e430ff8b060b4e8e922bae24b3c57837da6ea77 broke support for LLVM 3.9 and older versions in Clover. This patch restores it and refactors the support using Clover compatibility layer for LLVM. v2: merged #ifdef blocks v3: added support for LLVM 3.6-3.8 v4: add missing #ifdef around <memory> v5: simplify using templates and lambda Signed-off-by: Vedran Miletić <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98740 Tested-by[v4]: Pierre Moreau <[email protected]> Tested-by: Vinson Lee <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> Reviewed-by: Jan Vesely <[email protected]>
* scons: Recognize LLVM_CONFIG environment variable.Vinson Lee2016-11-241-1/+2
| | | | | | Signed-off-by: Vinson Lee <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* util: fix memory leak from the fragment shaders for SINT<->UINT blitsCharmaine Lee2016-11-231-1/+1
| | | | | | This patch deletes those fragment shaders in util_blitter_destroy(). Reviewed-by: Brian Paul <[email protected]>
* swr: clear every layer of the attached surfacesIlia Mirkin2016-11-231-6/+29
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] pipe renderTargetArrayIndex through to clearsIlia Mirkin2016-11-237-20/+35
| | | | | | | | | | Currently clears only operate on the 0th array index (ignoring surface layout parameters). Instead normalize to take a RTAI like all the load/store tile logic does, and use ComputeSurfaceAddress to properly take the surface state's lod/array index into account. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] clear data now comes in as floatIlia Mirkin2016-11-231-10/+4
| | | | | | | | The non-fast-clear path was never updated after clear colors were passed in as floats. Remove the now-harmful conversion from unorm8. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] actually perform clear before store in GetHotTileIlia Mirkin2016-11-231-0/+12
| | | | | | | | | When switching render target array indexes (as might happen in a GS, or in a future change, with layered clears), if the previous state is HOTTILE_CLEAR, we should actually clear the tile before saving it off. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* radeonsi: print new opt flags in si_dump_shader_keyMarek Olšák2016-11-231-0/+9
| | | | | Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add a debug flag that disables optimized shader variantsMarek Olšák2016-11-233-0/+7
| | | | | Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* swr: [rasterizer core] fix cast for stencil clear valueTim Rowley2016-11-221-3/+2
| | | | | | | Bad type cast for stencil clear value was picking up structure padding bytes. Reviewed-by: Ilia Mirkin <[email protected]>
* swr: color interpolation is also supposed to get perspective divisionIlia Mirkin2016-11-221-2/+4
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: add sprite coord enable mask to fs keyIlia Mirkin2016-11-222-1/+3
| | | | | | | This fixes gl-coord-replace-doesnt-eliminate-frag-tex-coords Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: rework vert <-> frag shader linkage logicIlia Mirkin2016-11-221-43/+50
| | | | | | | | | | | | Fixes a few things: - sprite coords only apply to generic varyings, and are a bitmask - back color only applies in 2-sided lighting mode - handle some odd situations between only some front/back colors being there. This is only semi-legal in GL, but we shouldn't start crashing. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: flatshading makes color outputs flat, it doesn't affect othersIlia Mirkin2016-11-221-4/+2
| | | | | | | | We were previously not marking the "regular" flat outputs as flat when flatshading was enabled. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: only broadcast color0 value, not all color valuesIlia Mirkin2016-11-221-1/+2
| | | | | | | | | | | | | The way that dual-source blending is described for GLES2 is very odd, and we end up with a shader that both has this property set *and* has a color1 value to be used as the second source. While changing the state tracker is an option, it seems more reliable to verify that the broadcast is only done on color0. Fixes arb_blend_func_extended-fbo-extended-blend-pattern_gles2 Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: report a reasonable max lod biasIlia Mirkin2016-11-221-1/+1
| | | | | | | | | This is the same value that llvmpipe uses. Since swr uses the same sampler logic, makes sense for this value to also be the same. Most applications don't care. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: avoid using exceptions for expected condition handlingIlia Mirkin2016-11-221-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I was getting a weird segfault from GCC 4.9.3: 0x00007ffff54f27aa in strlen () from /lib64/libc.so.6 (gdb) bt #0 0x00007ffff54f27aa in strlen () from /lib64/libc.so.6 #1 0x00007ffff4f128e5 in get_cie_encoding (cie=cie@entry=0x7ffff6e09813) at /gcc-4.9.3/libgcc/unwind-dw2-fde.c:272 #2 0x00007ffff4f1318e in classify_object_over_fdes (ob=ob@entry=0xd7bb90, this_fde=0x7ffff7f11010) at /gcc-4.9.3/libgcc/unwind-dw2-fde.c:628 #3 0x00007ffff4f135ba in init_object (ob=0xd7bb90) at /gcc-4.9.3/libgcc/unwind-dw2-fde.c:749 #4 search_object (ob=ob@entry=0xd7bb90, pc=pc@entry=0x7ffff4f11f4d <_Unwind_RaiseException+61>) at /gcc-4.9.3/libgcc/unwind-dw2-fde.c:961 #5 0x00007ffff4f13e62 in _Unwind_Find_registered_FDE (bases=0x7fffffffd358, pc=0x7ffff4f11f4d <_Unwind_RaiseException+61>) at /gcc-4.9.3/libgcc/unwind-dw2-fde.c:1025 #6 _Unwind_Find_FDE (pc=0x7ffff4f11f4d <_Unwind_RaiseException+61>, bases=bases@entry=0x7fffffffd358) at /gcc-4.9.3/libgcc/unwind-dw2-fde-dip.c:450 #7 0x00007ffff4f11197 in uw_frame_state_for (context=context@entry=0x7fffffffd2b0, fs=fs@entry=0x7fffffffd100) at /gcc-4.9.3/libgcc/unwind-dw2.c:1245 #8 0x00007ffff4f11b15 in uw_init_context_1 (context=context@entry=0x7fffffffd2b0, outer_cfa=outer_cfa@entry=0x7fffffffd660, outer_ra=0x7ffff518d23b <__cxa_throw+91>) at /gcc-4.9.3/libgcc/unwind-dw2.c:1566 #9 0x00007ffff4f11f4e in _Unwind_RaiseException (exc=0xd7c250) at /gcc-4.9.3/libgcc/unwind.inc:88 #10 0x00007ffff518d23b in __cxa_throw () from /usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/libstdc++.so.6 #11 0x00007ffff51ed556 in std::__throw_out_of_range(char const*) () from /usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/libstdc++.so.6 #12 0x00007fffea778be0 in std::map<pipe_format, SWR_FORMAT, std::less<pipe_format>, std::allocator<std::pair<pipe_format const, SWR_FORMAT> > >::at ( this=0x7fffebeb4c40 <mesa_to_swr_format(pipe_format)::mesa2swr>, __k=@0x7fffffffd73c: PIPE_FORMAT_RGTC1_UNORM) at /usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/include/g++-v4/bits/stl_map.h:549 #13 0x00007fffea776aee in mesa_to_swr_format (format=PIPE_FORMAT_RGTC1_UNORM) at swr_screen.cpp:597 We can just void this whole issue by not using exceptions in the first place. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr: remove formats from mapping table that don't have StoreTile implsIlia Mirkin2016-11-221-38/+48
| | | | | | | | | | | This table exists for the purpose of determining renderable formats. Without a StoreTile implementation, that can't happen. This basically removes rendering support to all L/LA/I formats. They can be re-added when/if StoreTile implementations are added. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr: remove unnecessary -1 entries in format mapping tableIlia Mirkin2016-11-221-126/+0
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr: rework resource layout and surface setupIlia Mirkin2016-11-226-160/+352
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a bit of a mega-commit, but unfortunately there's no great way to break this up since a lot of different pieces have to match up. Here we do the following: - change surface layout to match swr's Load/StoreTile expectations - fix sampler settings to respect all sampler view parameters - fix stencil sampling to read from secondary resource - respect pipe surface format, level, and layer settings - fix resource map/unmap based on the new layout logic - fix resource map/unmap to copy proper parts of stencil values in and out of the matching depth texture These fix a massive quantity of piglits, including all the tex-miplevel-selection ones. Note that the swr native miptree layout isn't extremely space-efficient, and we end up using it for all textures, not just the renderable ones. A back-of-the-envelope calculation suggests about 10%-25% increased memory usage for miptrees, depending on the number of LODs. Single-LOD textures should be unaffected. There are a handful of regressions as a result of this change: - Some textureGrad tests, these failures match llvmpipe. (There are debug settings allowing improved gallivm sampling accurancy.) - Some layered clearing tests as swr doesn't currently support that. It was getting lucky before because enough other things were broken. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* util: fix missing swizzle components in the SINT <-> UINT conversion stringCharmaine Lee2016-11-231-2/+2
| | | | | | | Fixes tgsi error introduced in commit 3817a7a. The error complains missing swizzle component in the conversion string "UMIN TEMP[0], TEMP[0], IMM[0].x". Reviewed-by: Roland Scheidegger <[email protected]>
* vc4: Don't conditionalize the src1 mov of qir_SEL().Eric Anholt2016-11-221-4/+2
| | | | | | | | | | | | | | | | | My thought in having both arguments conditionally moved was that it should theoretically save some power by not doing work in those channels. However, it ends up costing us instructions because we can't register-coalesce the first of the MOVs, and it also introduces extra scheduling dependencies. The instruction cost would swamp whatever power benefit I was hoping for. shader-db results: total instructions in shared programs: 100548 -> 99741 (-0.80%) instructions in affected programs: 42450 -> 41643 (-1.90%) With obvious outliers removed (I had an X11 emacs running over the network in the "after" case), 3DMMES Taiji showed 1.07231% +/- 0.488241% fps improvement (n=18, 30).
* vc4: Re-add R4 to the "any" register class.Eric Anholt2016-11-221-0/+2
| | | | | | | | | | I screwed this up in fdad4d24024ab7bc9b6b9cb6288f8b76ccac0d89 which was supposed to be making this code more maintainable. What's amazing is multithreaded FS showed the wins it did despite this bug. shader-db results: total instructions in shared programs: 103535 -> 100548 (-2.89%) instructions in affected programs: 83794 -> 80807 (-3.56%)
* vc4: Disable MSAA rasterization when the job binning is single-sampled.Eric Anholt2016-11-221-2/+13
| | | | | | | | Gallium core just changed to start setting MSAA enabled in the rasterizer state even with samples==1 buffers. This caused disagreements in our driver between binning and rasterization state, which the simulator threw assertion failures about. Keep the single-sampled samples==1 behavior for now.
* vc4: Make sure we don't overflow texture input/output FIFOs when threaded.Eric Anholt2016-11-221-2/+3
| | | | | | | | I dropped the first hunk of this change last minute when I decided it wasn't actually needed, and apparently failed to piglit it in simulation. The simulator threw an an assertion in gl-1.0-drawpixels-color-index, which queued up 5 coordinates (3 before a switch, two after) before loading the result.
* gallium: fix more occurences of u_hash.hMarek Olšák2016-11-225-5/+5
| | | | this fixes compile failures since 86514d84e0beec47c82da4888db12bf07f33cb83
* util: import CRC32 implementation from galliumMarek Olšák2016-11-224-179/+1
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* auxiliary/vl/dri: call get_xcb_screen() only onceEmil Velikov2016-11-221-2/+6
| | | | | Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* swr: calculate viewport width/height based on the scaleIlia Mirkin2016-11-211-6/+12
| | | | | | | | The former calculations were for min/max y. The width/height don't take translate into account. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: don't claim to allow setting layer/viewport from VSIlia Mirkin2016-11-211-1/+1
| | | | | | | | | | | | This may ultimately be possible to support, but for now it's not hooked up and the swr core only supports this output from GS. This normally wouldn't matter, but we lie about supporting GL 3.2, and also the blitter and st/mesa will make use of this functionality if claimed. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: allocate all scratch space in one go for vertex buffersIlia Mirkin2016-11-212-5/+31
| | | | | | | | | | | Multiple buffers may reference client arrays. When this happens, we might reach for scratch space multiple times, which could cause later arrays to invalidate the pointers allocated for the earlier ones. This fixes copyteximage 2D_ARRAY. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr: call swr_update_derived unconditionally when drawing/clearingIlia Mirkin2016-11-212-4/+2
| | | | | | | | | | | | | | | | | | Currently a sequence like draw/map/draw/map will cause the second map to not wait for the second draw. This is because the first map will clear the resource business bit, and the second draw won't reset it since no state has changed. swr_update_derived does a tiny bit of extra work, including updating the SWR_BACKEND_STATE as well as waiting for prending fences. If that's a problem, we could call swr_update_resource_status directly from draw/clear handlers. Fixes clearbuffer-stencil, clearbuffer-depth, clearbuffer-depth-stencil, and clearbuffer-display-lists. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer memory] minify texture width before alignmentIlia Mirkin2016-11-211-2/+2
| | | | | | | | | The minification should happen before alignment, not after. See similar logic on ComputeLODOffsetY. The current logic requires unnecessarily large textures when there's an initial NPOT size. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: [rasterizer memory] minify original sizes for block formatsIlia Mirkin2016-11-211-11/+25
| | | | | | | | | There's no guarantee that mip width/height will be a multiple of the compressed block size. Doing a divide by the block size first yields different results than GL expects, so we do the divide at the end. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* radeonsi: remove all varyings for depth-only rendering or rasterization offMarek Olšák2016-11-213-1/+21
| | | | | Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: eliminate VS outputs that aren't used by PS at runtimeMarek Olšák2016-11-213-9/+61
| | | | | | | | | | | | | | | | | | A past commit added the ability to compile "optimized" shader variants asynchronously (not stalling the app). This commit builds upon that and adds what is basically a runtime shader linker. If a VS output isn't used by the currently-bound PS, a new VS compilation is started without that output. The new shader variant is used when it's ready. All apps using separate shader objects I've seen had unused VS outputs. Eliminating unused/useless VS outputs also eliminates the corresponding vertex attribute loads. Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: record information about all written and read varyingsMarek Olšák2016-11-213-3/+98
| | | | | | | It's just tgsi_shader_info with DEFAULT_VAL varyings removed. Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: make si_shader_io_get_unique_index stricterMarek Olšák2016-11-212-11/+14
| | | | | Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>