aboutsummaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* intel/blorp_blit: Split blorp blits if they are too largeJordan Justen2016-12-071-6/+96
| | | | | | | | | | | | | | | | | | | | | We rename do_blorp_blit() to try_blorp_blit(), and add a return error if the surface size for the blit is too large. Now, do_blorp_blit() is rewritten to try to split the blit into smaller operations if try_blorp_blit() fails. Note: In this commit, try_blorp_blit() will always attempt to blit and never return an error, which matches the previous behavior. We will enable the size checking and splitting in a future commit. The motivation for this splitting is that in some cases when we flatten an image, it's dimensions grow, and this can then exceed the programmable hardware limits. An example is w-tiled+MSAA blits. v2: * Use double instead of float. (Jason) Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/blorp_blit: Create structure for src & dst coordinatesJordan Justen2016-12-071-19/+56
| | | | | | | | | | | | | | This will be useful for splitting blits into smaller sizes. We also make the coordinates of type double rather than float. Since we will be splitting and scaling the coordinates, we might require extra precision in the calculations. v2: * Use double instead of float. (Jason) Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* vulkan: use STATIC_ASSERT instead of static_assertEdward O'Callaghan2016-12-073-3/+3
| | | | | | | | Following the spirit of commit 23d1799f, fixes compilation warnings on Android build due to lack of C11 features. Signed-off-by: Edward O'Callaghan <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: enable INTEL_conservative_rasterization on Gen9+Lionel Landwerlin2016-12-076-5/+18
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* mesa: add support for GL_INTEL_conservative_rasterizationLionel Landwerlin2016-12-0713-7/+129
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Add i965 plumbing for ARB_post_depth_coverage for i965 (gen9+).Plamena Manolova2016-12-074-3/+13
| | | | | | | | | | This extension allows the fragment shader to control whether values in gl_SampleMaskIn[] reflect the coverage after application of the early depth and stencil tests. Signed-off-by: Plamena Manolova <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* mesa: Add GL and GLSL plumbing for ARB_post_depth_coverage for i965 (gen9+).Plamena Manolova2016-12-0711-1/+53
| | | | | | | | | This extension allows the fragment shader to control whether values in gl_SampleMaskIn[] reflect the coverage after application of the early depth and stencil tests. Signed-off-by: Plamena Manolova <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* radeonsi: fix isolines tess factor writes to control ringNicolai Hähnle2016-12-071-4/+12
| | | | | | Fixes piglit arb_tessellation_shader/execution/isoline{_no_tcs}.shader_test. Cc: [email protected]
* i965: Drop redundant key->outputs_written initialization.Kenneth Graunke2016-12-061-2/+0
| | | | | | | This was already set to the same value earlier. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Initialize "separate" flag in VUE maps.Kenneth Graunke2016-12-061-0/+3
| | | | | | | | | | | | This was uninitialized, which resulted in weird looking printouts where it appeared that the TCS output and TES input patch URB entries differed in SSO/non-SSO layout. There is no "separable" layout for both, as they're tied together. It should have no other actual effect. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* nir: In split_var_copies_block, uint, int, and bool types cannot be matricesIan Romanick2016-12-061-3/+5
| | | | | | | | Noticed while adding support for 64-bit integer types. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* radeonsi: Use amdgcn intrinsics for fs interpolationTom Stellard2016-12-071-54/+142
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* freedreno/a5xx: fix draw packet size with index bufferRob Clark2016-12-061-1/+1
| | | | | | gpuaddr of idx buffer is now two dwords (64b). Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: gmem bypass modeRob Clark2016-12-061-0/+72
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: fix emit_string_marker()Rob Clark2016-12-061-1/+4
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: pitch alignment should match gmem alignmentRob Clark2016-12-065-15/+22
| | | | | | | Deal w/ differing gmem tile size alignment between generations, and make sure texture pitch matches. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: more formatsRob Clark2016-12-061-41/+41
| | | | | | Bunch of stuff we can at least turn on for vbo formats. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: fix fragfaceRob Clark2016-12-061-2/+4
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: fix fragcoordRob Clark2016-12-061-4/+11
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: update generated headersRob Clark2016-12-067-20/+129
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: fix alpha testRob Clark2016-12-063-5/+1
| | | | | | | | GRAS_SU_DEPTH_PLANE_CNTL doesn't in fact seem to be anything to do with alpha test. This fixes xonotic and (other than some iommu faults) gets gnome-shell working. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: fix VPC_VAR[n].DISABLE bitsRob Clark2016-12-061-13/+13
| | | | | | | | We don't need varying interpolators enabled for pos/psize out of the VS (despite the fact that they show up in VS_OUT map), so emit these before we append pos/psize to the linkage. Signed-off-by: Rob Clark <[email protected]>
* anv/TODO: Document sampling from HiZNanley Chery2016-12-061-0/+1
| | | | Acked-by: Jason Ekstrand <[email protected]>
* i965: Don't force SSO layout for VS->TCS.Kenneth Graunke2016-12-062-4/+3
| | | | | | | | | | | | | | | | | | This was a hack which worked around the VS and TCS disagreeing on their shared interface due to the lack of varying packing. In particular, it was needed by Piglit's tcs-input-read-array-interface test. However, that was just one case where things could go awry, so the previous commit forcibly made interfaces match. This hack is no longer necessary. It also seems to be broken, though I'm not sure why. It fixes Piglit regressions in spec/arb_shader_image_load_store/semantics from commit ec1f159ac81ed964415d102eed4a0a29be8e7937. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98893 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* i965: Unify shader interfaces explicitly.Kenneth Graunke2016-12-061-0/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A while ago, I made i965 start compiling shaders independently. The VUE map layouts were based entirely on each shader's input/output bitfields. Assuming the interfaces match, this works out well - both sides will compute the same layout, and outputs are correctly routed to inputs. At the time, I had assumed that the linker would guarantee that the interfaces match. While it usually succeeds, it unfortunately seems to fail in some cases. For example, Piglit's tcs-input-read-array-interface test has a VS output array with two elements, but the TCS only reads one. The linker isn't able to eliminate the unused element from the VS, which makes the interfaces not match. Another case is where a shader other than the last writes clip/cull distances. These should be demoted to ordinary varyings, but they currently aren't - so we think they still have some special meaning, and prevent them from being eliminated. Fixing the linker to guarantee this in all cases is complicated. It needs to be able to optimize out dead code. It's tied into varying packing and other messiness. While we can certainly improve it---and should---I'd rather not rely on it being correct in all cases. This patch ORs adjacent stages' input/output bitfields together, ensuring that their interface (and hence VUE map layout) will be compatible. This should safeguard us against linker insufficiencies. Fixes line rendering in Dolphin, and the Piglit test based on it: spec/glsl-1.50/execution/geometry/clip-distance-vs-gs-out. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97232 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* genxml/gen9: Change the default of MI_SEMAPHORE_WAIT::RegisterPoleModeJason Ekstrand2016-12-061-1/+1
| | | | | | | | | | We would really like it to be false as that's what you get on hardware that doesn't have RegisterPoleMode (Sky Lake for example). While we're at it, we change it to a boolean. This fixes dEQP-VK.synchronization.smoke.events on Broxton. Reviewed-by: Kenneth Graunke <[email protected]> Cc: "13.0" <[email protected]>
* gallivm: optimize 16bit->32bit gather path a bitRoland Scheidegger2016-12-061-3/+39
| | | | | | | | | | LLVM can't really optimize anything which crosses scalar/vector boundaries, so help a bit with some particular gather operations when the width is expanded (only do it for 16->32bit expansion for now), by doing expansion after fetch. That is probably a better solution anyway even if llvm would recognize it, makes for cleaner IR... Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: handle 16bit float fetches in lp_build_fetch_rgba_soaRoland Scheidegger2016-12-061-4/+18
| | | | | | | | | | | | | | | | | | | | Note that we really want to _never_ reach the bottom of the function, which resorts to AoS fetch. Half floats can be handled just like other formats which fit into 32bit vectors (so, only 1x16 and 2x16 formats, albeit with more channels things are not THAT bad), with minimal plumbing. I've seen code size go down nearly by a factor of 3 for a complete texture sampling function (including bilinear filtering) using R16F. (What we should do for everything not special cased is to do AoS gather, shuffle/shift things into SoA vectors, and then do the conversion there. Otherwise it's particularly bad with 1 or 2 channel formats - that r16f format with either 4 or 8-wide vectors was still doing one element at a time, essentially doing exactly the same work as for rgba16f. Also replacing the channels with SWIZZLE0/1 (particularly the latter) adds even more work, as it has to be done per aos vector, and not just straightforward at the end with the SoA vector.) Reviewed-by: Jose Fonseca <[email protected]>
* util: (trivial) ETC1 meets the criteria for fitting into unorm8Roland Scheidegger2016-12-061-0/+5
| | | | | | Just like other similar compressed formats. Reviewed-by: Jose Fonseca <[email protected]>
* i965: Emit proper NOPs.Matt Turner2016-12-061-4/+2
| | | | | | | | | | | The PRMs for HSW and newer say that other than the opcode and DebugCtrl bits of the instruction word, the rest must be zero. By zeroing the instruction word manually, we avoid using any of the state inherited through brw_codegen. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96959 Reviewed-by: Ian Romanick <[email protected]>
* glsl: (trivial) fix type typoRoland Scheidegger2016-12-061-1/+1
| | | | | Accidentally changed the type of a constant in df33f11b39abf313a0db7b9fefaf739b88133161 causing assertion failures.
* i965: Allocate at least some URB space even when max_vertices = 0.Kenneth Graunke2016-12-051-1/+7
| | | | | | | | | | | | | | | | | | | | Allocating zero URB space is a really bad idea. The hardware has to give threads a handle to their URB space, and threads have to use that to terminate the thread. Having it be an empty region just breaks a lot of assumptions. Hence, why we asserted that it isn't possible. Unfortunately, it /is/ possible prior to Gen8, if max_vertices = 0. In theory a geometry shader could do SSBO/image access and maybe still accomplish something. In reality, this is tripped up by conformance tests. Gen8+ already avoids this problem by placing the vertex count DWord in the URB entry header. This fixes things on earlier generations. Cc: [email protected] Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Tested-by: Ian Romanick <[email protected]>
* main: allow NEAREST_MIPMAP_NEAREST for stencil texturingRoland Scheidegger2016-12-061-15/+8
| | | | | | | | | | | | As per GL 4.5 rules, which fixed a spec mistake in GL_ARB_stencil_texturing. The extension spec wasn't updated, but just allow it with older GL versions as well, hoping there aren't any crazy tests which want to see an error there... (Compile tested only.) Reported by Józef Kucia <[email protected]> Acked-by: Józef Kucia <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: fix ldexp lowering if bitfield insert lowering is also requestedRoland Scheidegger2016-12-061-5/+16
| | | | | | | | | | Trivial, this just resurrects the code which was there once upon a time (the code can't lower instructions generated in the lowering pass there, and even if it could it would probably be suboptimal). This fixes piglit mesa_shader_integer_functions fs-ldexp.shader_test and vs-ldexp.shader_test with llvmpipe. Reviewed-by: Matt Turner <[email protected]>
* radv: fix resource leak in radv_amdgpu_ctx_createNayan Deshmukh2016-12-061-0/+1
| | | | | | | | | CovID: 1396387 V2. Fixup bad whitespace. Signed-off-by: Nayan Deshmukh <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* st/omx/enc Raise default encode levelAndy Furniss2016-12-051-1/+1
| | | | | | | | Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=91281 Signed-off-by: Andy Furniss <[email protected]> Reviewed-by: Christian König <[email protected]>
* radeon/vce Handle H.264 level 5.2Andy Furniss2016-12-051-1/+2
| | | | | | | | | Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=91281 v2: explicitly add case 52 Signed-off-by: Andy Furniss <[email protected]> Reviewed-by: Christian König <[email protected]>
* nir: Remove some unused fields from nir_variableJason Ekstrand2016-12-053-43/+0
| | | | | | | All of these are happily set from glsl_to_nir or spirv_to_nir but their values are never used for anything. Reviewed-by: Iago Toral Quiroga <[email protected]>
* nir: Delete most of the constant_initializer supportJason Ekstrand2016-12-055-146/+12
| | | | | | | | | | | Constant initializers have been a constant (ha!) pain for quite some time. While they're useful from a language perspective, people writing passes or backends really don't want deal with them most of the time. This commit removes most of the constant initializer support from NIR. It is expected that you call nir_lower_constant_initializers VERY EARLY to ensure that they're gone before you do anything interesting. Reviewed-by: Iago Toral Quiroga <[email protected]>
* nir: Simplify nir_lower_gs_intrinsicsJason Ekstrand2016-12-051-21/+16
| | | | | | | It's only ever called on single-function shaders. At this point, there are a lot of helpers that can make it all much simpler. Reviewed-by: Iago Toral Quiroga <[email protected]>
* nir/lower_returns: Stop using constant initializersJason Ekstrand2016-12-051-4/+5
| | | | Reviewed-by: Iago Toral Quiroga <[email protected]>
* glsl/nir: Call nir_lower_constant_initializersJason Ekstrand2016-12-051-0/+2
| | | | Reviewed-by: Iago Toral Quiroga <[email protected]>
* anv/pipeline: Call nir_lower_constant_initializersJason Ekstrand2016-12-051-0/+13
| | | | Reviewed-by: Iago Toral Quiroga <[email protected]>
* nir: Add a pass for lowering away constant initializersJason Ekstrand2016-12-053-0/+115
| | | | Reviewed-by: Iago Toral Quiroga <[email protected]>
* Revert "i965: use nir_lower_indirect_derefs() for GLSL"Jason Ekstrand2016-12-053-10/+23
| | | | | This reverts commit 9404439a754e5640ccd98df40fa694835c0d8759. I didn't intend to push it and it breaks clip and cull distance.
* i965: Delete the meta-base CopyImageSubData implementationJason Ekstrand2016-12-054-328/+0
| | | | | | | | | | | | | | | | | | | When I originally implemented the ARB_copy_image extension, the fast-path was written in meta using texture views. This path only worked if both images were uncompressed color images. All of the other cases fell back to the blitter or, in the worst case, mapping and memcpy on the CPU. Now that we have the blorp path, it handles all copies ever and the old meta, blitter, and CPU paths are only used on gen5 and below. The primary reason why we needed the meta path (apart from having a slow blitter on later hardware) was to handle multisampling which gen5 and earlier don't support anyway. Since the blitter is reasonably fast on gen5, we can just delete the meta path and get rid of all that terrible code. If we decide that we're ok with just disabling ARB_copy_image on gen5 and earlier (I personally am), then we could get rid of another 300 lines or so of semi-hairy code. Reviewed-by: Anuj Phogat <[email protected]>
* i965/copy_image: Re-implement the blitter path with emit_miptree_blitJason Ekstrand2016-12-053-97/+80
| | | | | | | | | | By using emit_miptree_blit which does chunking, this fixes the blitter path for the case where the image is too tall to blit normally. We also pull it into intel_blit as intel_miptree_copy. This matches the naming of the blorp blit and copy functions brw_blorp_blit and brw_blorp_copy. Reviewed-by: Anuj Phogat <[email protected]> Cc: "13.0" <[email protected]>
* i965/blit: Break the guts of intel_miptree_blit into a helperJason Ekstrand2016-12-051-67/+84
| | | | | Reviewed-by: Anuj Phogat <[email protected]> Cc: "13.0" <[email protected]>
* i965: use nir_lower_indirect_derefs() for GLSLTimothy Arceri2016-12-053-23/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This moves the nir_lower_indirect_derefs() call into brw_preprocess_nir() so thats is called by both OpenGL and Vulkan and removes that call to the old GLSL IR pass lower_variable_index_to_cond_assign() We want to do this pass in nir to be able to move loop unrolling to nir. There is a increase of 1-3 instructions in a small number of shaders, and 2 Kerbal Space program shaders that increase by 32 instructions. Shader-db results BDW: total instructions in shared programs: 8705873 -> 8706194 (0.00%) instructions in affected programs: 32515 -> 32836 (0.99%) helped: 3 HURT: 79 total cycles in shared programs: 74618120 -> 74583476 (-0.05%) cycles in affected programs: 528104 -> 493460 (-6.56%) helped: 47 HURT: 37 LOST: 2 GAINED: 0
* swr: mark PIPE_CAP_NATIVE_FENCE_FD unsupportedTim Rowley2016-12-051-0/+1
| | | | Reviewed-by: Bruce Cherniak <[email protected]>