summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: fix incorrect FMASK checking in bind_sampler_statesMarek Olšák2016-12-071-4/+4
| | | | | Cc: 12.0 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: always restore sampler states when unbinding sampler viewsMarek Olšák2016-12-071-3/+8
| | | | | Cc: 12.0 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: take LDS into account for compute shader occupancy statsMarek Olšák2016-12-071-11/+18
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* st/mesa: round lod_bias to a multiple of 1/256Marek Olšák2016-12-071-0/+6
| | | | | | | This reduces the number of sampler states 3.6x in Batman Arkham: Origins. (from ~7200 to ~2000) Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: decrease the size of pipe_sampler_state fieldsMarek Olšák2016-12-071-3/+3
| | | | | | | | We've had unused bits. Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* cso: don't release sampler states that are boundMarek Olšák2016-12-071-1/+3
| | | | | | | | | | | | | | This fixes random radeonsi GPU hangs in Batman Arkham: Origins (Wine) and probably many other games too. cso_cache deletes sampler states when the cache size is too big and doesn't check which sampler states are bound, causing use-after-free in drivers. Because of that, radeonsi uploaded garbage sampler states and the hardware went bananas. Other drivers may have experienced similar issues. Cc: 12.0 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* i965: Increase max texture to 16k for gen7+Jordan Justen2016-12-071-3/+10
| | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98297 Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/blorp_blit: Add split_blorp_blit_debug switchJordan Justen2016-12-071-3/+9
| | | | | | | | | Enabling this debug switch causes surface shrinking to happen by default, and lowers the surface size limit which causes blorp blits to be split. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/blorp_blit: Enable splitting large blorp blitsJordan Justen2016-12-071-1/+40
| | | | | | | | | | | | | | | | Detect when the surface sizes are too large for a blorp blit. When it is too large, the blorp blit will be split into a smaller operation and attempted again. For gen7, this fixes the cts test: ES3-CTS.gtf.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_multisampled_to_singlesampled_blit It will also enable us to increase our renderable size from 8k x 8k to 16k x 16k. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/blorp_blit: Move RGB=>R conversion to follow blit splittingJordan Justen2016-12-071-48/+65
| | | | | | | | | | | | | | In blorp_copy, when RGB surfaces are copied, we convert the destination surface to a Red only surface, but 3 times as wide. This introduces an implicit restriction of "mod 3" for the destination width. It is easier to handle the blorp split buffer offsetting with the original RGB surface, and do the RGB=>R after this. Suggested-by: Jason Ekstrand <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/blorp_blit: Adjust blorp surface parameters for split blitsJordan Justen2016-12-071-3/+94
| | | | | | | | | | | | If try_blorp_blit() previously returned that a blit was too large, shrink_surface_params() will be used to update the surface parameters for the smaller blit so the blit operation can proceed. v2: * Use double instead of float. (Jason) Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/blorp_blit: Split blorp blits if they are too largeJordan Justen2016-12-071-6/+96
| | | | | | | | | | | | | | | | | | | | | We rename do_blorp_blit() to try_blorp_blit(), and add a return error if the surface size for the blit is too large. Now, do_blorp_blit() is rewritten to try to split the blit into smaller operations if try_blorp_blit() fails. Note: In this commit, try_blorp_blit() will always attempt to blit and never return an error, which matches the previous behavior. We will enable the size checking and splitting in a future commit. The motivation for this splitting is that in some cases when we flatten an image, it's dimensions grow, and this can then exceed the programmable hardware limits. An example is w-tiled+MSAA blits. v2: * Use double instead of float. (Jason) Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/blorp_blit: Create structure for src & dst coordinatesJordan Justen2016-12-071-19/+56
| | | | | | | | | | | | | | This will be useful for splitting blits into smaller sizes. We also make the coordinates of type double rather than float. Since we will be splitting and scaling the coordinates, we might require extra precision in the calculations. v2: * Use double instead of float. (Jason) Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* vulkan: use STATIC_ASSERT instead of static_assertEdward O'Callaghan2016-12-073-3/+3
| | | | | | | | Following the spirit of commit 23d1799f, fixes compilation warnings on Android build due to lack of C11 features. Signed-off-by: Edward O'Callaghan <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: enable INTEL_conservative_rasterization on Gen9+Lionel Landwerlin2016-12-076-5/+18
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* mesa: add support for GL_INTEL_conservative_rasterizationLionel Landwerlin2016-12-0713-7/+129
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Add i965 plumbing for ARB_post_depth_coverage for i965 (gen9+).Plamena Manolova2016-12-074-3/+13
| | | | | | | | | | This extension allows the fragment shader to control whether values in gl_SampleMaskIn[] reflect the coverage after application of the early depth and stencil tests. Signed-off-by: Plamena Manolova <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* mesa: Add GL and GLSL plumbing for ARB_post_depth_coverage for i965 (gen9+).Plamena Manolova2016-12-0711-1/+53
| | | | | | | | | This extension allows the fragment shader to control whether values in gl_SampleMaskIn[] reflect the coverage after application of the early depth and stencil tests. Signed-off-by: Plamena Manolova <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* radeonsi: fix isolines tess factor writes to control ringNicolai Hähnle2016-12-071-4/+12
| | | | | | Fixes piglit arb_tessellation_shader/execution/isoline{_no_tcs}.shader_test. Cc: [email protected]
* i965: Drop redundant key->outputs_written initialization.Kenneth Graunke2016-12-061-2/+0
| | | | | | | This was already set to the same value earlier. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Initialize "separate" flag in VUE maps.Kenneth Graunke2016-12-061-0/+3
| | | | | | | | | | | | This was uninitialized, which resulted in weird looking printouts where it appeared that the TCS output and TES input patch URB entries differed in SSO/non-SSO layout. There is no "separable" layout for both, as they're tied together. It should have no other actual effect. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* nir: In split_var_copies_block, uint, int, and bool types cannot be matricesIan Romanick2016-12-061-3/+5
| | | | | | | | Noticed while adding support for 64-bit integer types. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* radeonsi: Use amdgcn intrinsics for fs interpolationTom Stellard2016-12-071-54/+142
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* freedreno/a5xx: fix draw packet size with index bufferRob Clark2016-12-061-1/+1
| | | | | | gpuaddr of idx buffer is now two dwords (64b). Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: gmem bypass modeRob Clark2016-12-061-0/+72
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: fix emit_string_marker()Rob Clark2016-12-061-1/+4
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: pitch alignment should match gmem alignmentRob Clark2016-12-065-15/+22
| | | | | | | Deal w/ differing gmem tile size alignment between generations, and make sure texture pitch matches. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: more formatsRob Clark2016-12-061-41/+41
| | | | | | Bunch of stuff we can at least turn on for vbo formats. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: fix fragfaceRob Clark2016-12-061-2/+4
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: fix fragcoordRob Clark2016-12-061-4/+11
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: update generated headersRob Clark2016-12-067-20/+129
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: fix alpha testRob Clark2016-12-063-5/+1
| | | | | | | | GRAS_SU_DEPTH_PLANE_CNTL doesn't in fact seem to be anything to do with alpha test. This fixes xonotic and (other than some iommu faults) gets gnome-shell working. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: fix VPC_VAR[n].DISABLE bitsRob Clark2016-12-061-13/+13
| | | | | | | | We don't need varying interpolators enabled for pos/psize out of the VS (despite the fact that they show up in VS_OUT map), so emit these before we append pos/psize to the linkage. Signed-off-by: Rob Clark <[email protected]>
* anv/TODO: Document sampling from HiZNanley Chery2016-12-061-0/+1
| | | | Acked-by: Jason Ekstrand <[email protected]>
* i965: Don't force SSO layout for VS->TCS.Kenneth Graunke2016-12-062-4/+3
| | | | | | | | | | | | | | | | | | This was a hack which worked around the VS and TCS disagreeing on their shared interface due to the lack of varying packing. In particular, it was needed by Piglit's tcs-input-read-array-interface test. However, that was just one case where things could go awry, so the previous commit forcibly made interfaces match. This hack is no longer necessary. It also seems to be broken, though I'm not sure why. It fixes Piglit regressions in spec/arb_shader_image_load_store/semantics from commit ec1f159ac81ed964415d102eed4a0a29be8e7937. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98893 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* i965: Unify shader interfaces explicitly.Kenneth Graunke2016-12-061-0/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A while ago, I made i965 start compiling shaders independently. The VUE map layouts were based entirely on each shader's input/output bitfields. Assuming the interfaces match, this works out well - both sides will compute the same layout, and outputs are correctly routed to inputs. At the time, I had assumed that the linker would guarantee that the interfaces match. While it usually succeeds, it unfortunately seems to fail in some cases. For example, Piglit's tcs-input-read-array-interface test has a VS output array with two elements, but the TCS only reads one. The linker isn't able to eliminate the unused element from the VS, which makes the interfaces not match. Another case is where a shader other than the last writes clip/cull distances. These should be demoted to ordinary varyings, but they currently aren't - so we think they still have some special meaning, and prevent them from being eliminated. Fixing the linker to guarantee this in all cases is complicated. It needs to be able to optimize out dead code. It's tied into varying packing and other messiness. While we can certainly improve it---and should---I'd rather not rely on it being correct in all cases. This patch ORs adjacent stages' input/output bitfields together, ensuring that their interface (and hence VUE map layout) will be compatible. This should safeguard us against linker insufficiencies. Fixes line rendering in Dolphin, and the Piglit test based on it: spec/glsl-1.50/execution/geometry/clip-distance-vs-gs-out. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97232 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* genxml/gen9: Change the default of MI_SEMAPHORE_WAIT::RegisterPoleModeJason Ekstrand2016-12-061-1/+1
| | | | | | | | | | We would really like it to be false as that's what you get on hardware that doesn't have RegisterPoleMode (Sky Lake for example). While we're at it, we change it to a boolean. This fixes dEQP-VK.synchronization.smoke.events on Broxton. Reviewed-by: Kenneth Graunke <[email protected]> Cc: "13.0" <[email protected]>
* gallivm: optimize 16bit->32bit gather path a bitRoland Scheidegger2016-12-061-3/+39
| | | | | | | | | | LLVM can't really optimize anything which crosses scalar/vector boundaries, so help a bit with some particular gather operations when the width is expanded (only do it for 16->32bit expansion for now), by doing expansion after fetch. That is probably a better solution anyway even if llvm would recognize it, makes for cleaner IR... Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: handle 16bit float fetches in lp_build_fetch_rgba_soaRoland Scheidegger2016-12-061-4/+18
| | | | | | | | | | | | | | | | | | | | Note that we really want to _never_ reach the bottom of the function, which resorts to AoS fetch. Half floats can be handled just like other formats which fit into 32bit vectors (so, only 1x16 and 2x16 formats, albeit with more channels things are not THAT bad), with minimal plumbing. I've seen code size go down nearly by a factor of 3 for a complete texture sampling function (including bilinear filtering) using R16F. (What we should do for everything not special cased is to do AoS gather, shuffle/shift things into SoA vectors, and then do the conversion there. Otherwise it's particularly bad with 1 or 2 channel formats - that r16f format with either 4 or 8-wide vectors was still doing one element at a time, essentially doing exactly the same work as for rgba16f. Also replacing the channels with SWIZZLE0/1 (particularly the latter) adds even more work, as it has to be done per aos vector, and not just straightforward at the end with the SoA vector.) Reviewed-by: Jose Fonseca <[email protected]>
* util: (trivial) ETC1 meets the criteria for fitting into unorm8Roland Scheidegger2016-12-061-0/+5
| | | | | | Just like other similar compressed formats. Reviewed-by: Jose Fonseca <[email protected]>
* i965: Emit proper NOPs.Matt Turner2016-12-061-4/+2
| | | | | | | | | | | The PRMs for HSW and newer say that other than the opcode and DebugCtrl bits of the instruction word, the rest must be zero. By zeroing the instruction word manually, we avoid using any of the state inherited through brw_codegen. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96959 Reviewed-by: Ian Romanick <[email protected]>
* glsl: (trivial) fix type typoRoland Scheidegger2016-12-061-1/+1
| | | | | Accidentally changed the type of a constant in df33f11b39abf313a0db7b9fefaf739b88133161 causing assertion failures.
* i965: Allocate at least some URB space even when max_vertices = 0.Kenneth Graunke2016-12-051-1/+7
| | | | | | | | | | | | | | | | | | | | Allocating zero URB space is a really bad idea. The hardware has to give threads a handle to their URB space, and threads have to use that to terminate the thread. Having it be an empty region just breaks a lot of assumptions. Hence, why we asserted that it isn't possible. Unfortunately, it /is/ possible prior to Gen8, if max_vertices = 0. In theory a geometry shader could do SSBO/image access and maybe still accomplish something. In reality, this is tripped up by conformance tests. Gen8+ already avoids this problem by placing the vertex count DWord in the URB entry header. This fixes things on earlier generations. Cc: [email protected] Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Tested-by: Ian Romanick <[email protected]>
* main: allow NEAREST_MIPMAP_NEAREST for stencil texturingRoland Scheidegger2016-12-061-15/+8
| | | | | | | | | | | | As per GL 4.5 rules, which fixed a spec mistake in GL_ARB_stencil_texturing. The extension spec wasn't updated, but just allow it with older GL versions as well, hoping there aren't any crazy tests which want to see an error there... (Compile tested only.) Reported by Józef Kucia <[email protected]> Acked-by: Józef Kucia <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: fix ldexp lowering if bitfield insert lowering is also requestedRoland Scheidegger2016-12-061-5/+16
| | | | | | | | | | Trivial, this just resurrects the code which was there once upon a time (the code can't lower instructions generated in the lowering pass there, and even if it could it would probably be suboptimal). This fixes piglit mesa_shader_integer_functions fs-ldexp.shader_test and vs-ldexp.shader_test with llvmpipe. Reviewed-by: Matt Turner <[email protected]>
* radv: fix resource leak in radv_amdgpu_ctx_createNayan Deshmukh2016-12-061-0/+1
| | | | | | | | | CovID: 1396387 V2. Fixup bad whitespace. Signed-off-by: Nayan Deshmukh <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* st/omx/enc Raise default encode levelAndy Furniss2016-12-051-1/+1
| | | | | | | | Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=91281 Signed-off-by: Andy Furniss <[email protected]> Reviewed-by: Christian König <[email protected]>
* radeon/vce Handle H.264 level 5.2Andy Furniss2016-12-051-1/+2
| | | | | | | | | Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=91281 v2: explicitly add case 52 Signed-off-by: Andy Furniss <[email protected]> Reviewed-by: Christian König <[email protected]>
* nir: Remove some unused fields from nir_variableJason Ekstrand2016-12-053-43/+0
| | | | | | | All of these are happily set from glsl_to_nir or spirv_to_nir but their values are never used for anything. Reviewed-by: Iago Toral Quiroga <[email protected]>
* nir: Delete most of the constant_initializer supportJason Ekstrand2016-12-055-146/+12
| | | | | | | | | | | Constant initializers have been a constant (ha!) pain for quite some time. While they're useful from a language perspective, people writing passes or backends really don't want deal with them most of the time. This commit removes most of the constant initializer support from NIR. It is expected that you call nir_lower_constant_initializers VERY EARLY to ensure that they're gone before you do anything interesting. Reviewed-by: Iago Toral Quiroga <[email protected]>