summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* egl/x11: store xcb_screen_t *screen instead of int screenEmil Velikov2016-11-223-66/+18
| | | | | | | | | | Just fetch and store it once, rather than doing the xcb_setup_roots_iterator + get_xcb_screen dance five times. v2: Call xcb_disconnect() on error (Eric) Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> (v1)
* egl/x11: factor out dri2_get_xcb_connection()Emil Velikov2016-11-221-36/+27
| | | | | | | | | | | | | | | | | Identical throughout dri2, dri3 and drisw. Next patch will add more common code, so rather than duplicating it factor out the function. Note: this also sets eglError on failure. Something that's quite inconsistent throughout the codebase. v2: Call xcb_disconnect() on error (Eric) Note: use xcb_disconnect() even in the xcb_connection_has_error() case as per the manual: ... memory will not be freed until xcb_disconnect... Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> (v1)
* mesa/glsl: remove unused uses_builtin_functions fieldTimothy Arceri2016-11-233-3/+0
| | | | | | This has been unused since 943b69cddd Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* i965: Use NIR-based clip/cull lowering for OpenGL as well.Kenneth Graunke2016-11-222-1/+2
| | | | | | | | | | | | The old approach works fine, and this approach isn't necessarily better. But it at least has the advantage that Vulkan and GL use the same approach. I originally wrote it to gain additional testing for the new paths. shader-db statistics show 0 instruction count changes. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Enable clip and cull distance support.Kenneth Graunke2016-11-222-6/+5
| | | | | | | Everything is now in place, and we appear to pass the tests on Gen7+. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/vec4: Handle component qualifiers on non-generic varyings.Kenneth Graunke2016-11-225-73/+53
| | | | | | | | | | | | | ARB_enhanced_layouts only requires component qualifier support for generic varyings, so this is all the vec4 backend knew how to handle. This patch extends the backend to handle it for all varyings, so we can use store_output intrinsics with a component set for things like clip/cull distances. We may want to use that for other VUE header fields in the future as well. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* i965/fs: Handle compact outputs.Kenneth Graunke2016-11-221-1/+3
| | | | | | | We need to calculate the number of vec4 slots correctly. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* spirv: Silence unsupported capability warnings for Clip/CullDistance.Kenneth Graunke2016-11-221-2/+2
| | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Set clip/cull distances fields in packets.Kenneth Graunke2016-11-221-6/+26
| | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Combine ClipDistance and CullDistance arrays.Kenneth Graunke2016-11-221-0/+3
| | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: add a pass to compact clip/cull distances.Kenneth Graunke2016-11-223-0/+190
| | | | | | | v2: Use nir_is_per_vertex_io() rather than is_arrays_of_arrays(). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add a "compact array" flag and IO lowering code.Kenneth Graunke2016-11-227-18/+67
| | | | | | | | | | | | | | | | | | | | Certain built-in arrays, such as gl_ClipDistance[], gl_CullDistance[], gl_TessLevelInner[], and gl_TessLevelOuter[] are specified as scalar arrays. Normal scalar arrays are sparse - each array element usually occupies a whole vec4 slot. However, most hardware assumes these built-in arrays are tightly packed. The new var->data.compact flag indicates that a scalar array should be tightly packed, so a float[4] array would take up a single vec4 slot, and a float[8] array would take up two slots. They are still arrays, not vec4s, however. nir_lower_io will generate intrinsics using ARB_enhanced_layouts style component qualifiers. v2: Add nir_validate code to enforce type restrictions. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* radv: add support for shader stats dumpDave Airlie2016-11-223-0/+87
| | | | | | | | | | | I've started working on a shader-db alike for Vulkan, it's based on vktrace and it records pipelines, this adds support to dump the shader stats exactly like radeonsi does, so I can reuse the shader-db scripts it uses. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: fix sample id loadingDave Airlie2016-11-221-1/+18
| | | | | | | | The sample id is packed into bits 8-12, so adjust things properly. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv/ac: add implementation of load_sample_pos intrinsic.Dave Airlie2016-11-221-0/+12
| | | | | | | This fixes a bunch of crashes in CTS tests looking for this. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv/ac: cleanup ddxy emissionDave Airlie2016-11-221-93/+43
| | | | | | | | | | This cleans up the ddxy emission along the same lines as radeonsi. It also means we don't use LDS on VI chips we use the dspermute interface, it also removes some duplicated code. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv/meta: cleanup resolve vertex state emissionDave Airlie2016-11-221-47/+2
| | | | | | | | For the hw resolve there is no need to emit any sort of texture coordinates, so drop them all in the meta path. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: Incorporate GPU family into cache UUID.Bas Nieuwenhuizen2016-11-221-3/+5
| | | | | | Invalidates the cache when someone switches cards. Signed-off-by: Bas Nieuwenhuizen <[email protected]>
* radv: Use library mtime for cache UUID.Bas Nieuwenhuizen2016-11-221-4/+32
| | | | | | | | | | | | We want to also invalidate the cache when LLVM gets changed. As the specific LLVM revision is not fixed at build time, we will need to check at runtime. Computing a checksum for LLVM is going to be very expensive, so just use the mtime. Tested on my computer that the returned DSO for the LLVM symbol is actually the LLVM DSO. Signed-off-by: Bas Nieuwenhuizen <[email protected]>
* radv: Store UUID in physical device.Bas Nieuwenhuizen2016-11-223-14/+16
| | | | | | | No sense in repeatedly determining it. Also, it might be dependent on the device as shaders get compiled differently for SI/CIK/VI etc. Signed-off-by: Bas Nieuwenhuizen <[email protected]>
* glsl: fix NULL checkTimothy Arceri2016-11-221-1/+1
| | | | Fixes copy and paste error in 9d96d3803ab
* swr: calculate viewport width/height based on the scaleIlia Mirkin2016-11-211-6/+12
| | | | | | | | The former calculations were for min/max y. The width/height don't take translate into account. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: don't claim to allow setting layer/viewport from VSIlia Mirkin2016-11-211-1/+1
| | | | | | | | | | | | This may ultimately be possible to support, but for now it's not hooked up and the swr core only supports this output from GS. This normally wouldn't matter, but we lie about supporting GL 3.2, and also the blitter and st/mesa will make use of this functionality if claimed. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: allocate all scratch space in one go for vertex buffersIlia Mirkin2016-11-212-5/+31
| | | | | | | | | | | Multiple buffers may reference client arrays. When this happens, we might reach for scratch space multiple times, which could cause later arrays to invalidate the pointers allocated for the earlier ones. This fixes copyteximage 2D_ARRAY. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr: call swr_update_derived unconditionally when drawing/clearingIlia Mirkin2016-11-212-4/+2
| | | | | | | | | | | | | | | | | | Currently a sequence like draw/map/draw/map will cause the second map to not wait for the second draw. This is because the first map will clear the resource business bit, and the second draw won't reset it since no state has changed. swr_update_derived does a tiny bit of extra work, including updating the SWR_BACKEND_STATE as well as waiting for prending fences. If that's a problem, we could call swr_update_resource_status directly from draw/clear handlers. Fixes clearbuffer-stencil, clearbuffer-depth, clearbuffer-depth-stencil, and clearbuffer-display-lists. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer memory] minify texture width before alignmentIlia Mirkin2016-11-211-2/+2
| | | | | | | | | The minification should happen before alignment, not after. See similar logic on ComputeLODOffsetY. The current logic requires unnecessarily large textures when there's an initial NPOT size. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: [rasterizer memory] minify original sizes for block formatsIlia Mirkin2016-11-211-11/+25
| | | | | | | | | There's no guarantee that mip width/height will be a multiple of the compressed block size. Doing a divide by the block size first yields different results than GL expects, so we do the divide at the end. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* radeonsi: remove all varyings for depth-only rendering or rasterization offMarek Olšák2016-11-213-1/+21
| | | | | Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: eliminate VS outputs that aren't used by PS at runtimeMarek Olšák2016-11-213-9/+61
| | | | | | | | | | | | | | | | | | A past commit added the ability to compile "optimized" shader variants asynchronously (not stalling the app). This commit builds upon that and adds what is basically a runtime shader linker. If a VS output isn't used by the currently-bound PS, a new VS compilation is started without that output. The new shader variant is used when it's ready. All apps using separate shader objects I've seen had unused VS outputs. Eliminating unused/useless VS outputs also eliminates the corresponding vertex attribute loads. Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: record information about all written and read varyingsMarek Olšák2016-11-213-3/+98
| | | | | | | It's just tgsi_shader_info with DEFAULT_VAL varyings removed. Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: make si_shader_io_get_unique_index stricterMarek Olšák2016-11-212-11/+14
| | | | | Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't export ClipVertex and ClipDistance[] if clipping is disabledMarek Olšák2016-11-214-5/+37
| | | | | | | | | This is the first user of optimized monolithic shader variants. Cull distances can't be disabled by states. Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add infrastr. for compiling optimized shader variants asynchronouslyMarek Olšák2016-11-212-34/+109
| | | | | Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't set vs.epilog.export_prim_id if TES is boundMarek Olšák2016-11-211-4/+4
| | | | | | | there is no VS epilog in this case Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: simplify checking for monolithic compilationMarek Olšák2016-11-214-8/+9
| | | | | Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: print all flags in si_dump_shader_keyMarek Olšák2016-11-211-0/+5
| | | | | Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: split the shader key into 3 logical partsMarek Olšák2016-11-215-194/+203
| | | | | | | | | key->part.*: prolog and epilog flags only key->as_{ls,es}: special flags key->mono.*: flags for monolithic compilation only Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: fix culling if clip & cull distances are used at the same timeMarek Olšák2016-11-211-2/+3
| | | | | | | | | Fixed piglits: - arb_cull_distance/clip-cull-3 - arb_cull_distance/clip-cull-4 Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: clean up si_emit_clip_regsMarek Olšák2016-11-211-4/+5
| | | | | Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: assume that a VS without POSITION is LSMarek Olšák2016-11-211-0/+7
| | | | | Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* tgsi/scan: record if a shader writes the position outputMarek Olšák2016-11-212-0/+3
| | | | | Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* tgsi/scan: use a big switch for scanning outputsMarek Olšák2016-11-211-40/+28
| | | | | Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: decrease the number of texture slots to 24Marek Olšák2016-11-211-1/+1
| | | | | | | | | Company Of Heroes 2 needs only 24. This saves 512 bytes of CE RAM per shader stage. Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: fast exit si_emit_derived_tess_state earlyMarek Olšák2016-11-212-11/+15
| | | | | Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* winsys/amdgpu: set addrlib flag opt4SpaceMarek Olšák2016-11-211-0/+1
| | | | | Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: check for !is_linear in do_hardware_msaa_resolveMarek Olšák2016-11-211-2/+4
| | | | | | | We don't want opt4Space here. Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: add RADEON_SURF_OPTIMIZE_FOR_SPACEMarek Olšák2016-11-213-1/+6
| | | | | | | | FORCE_TILING should disable it. It has no effect now, but that may change soon. Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: Add missing error-checking to si_create_compute_state (v2)Mun Gwan-gyeong2016-11-211-1/+5
| | | | | | | | | | | | | | | | When the uploading of shader fails on si_shader_binary_upload(), it returns -ENOMEM. We should handle si_shader_binary_upload() failure path on si_create_compute_state(). CID 1394027 v2: Fixes from Edward O'Callaghan's review a) Update explicitly return value check with "si_shader_binary_upload() < 0" b) Update commit message. Signed-off-by: Mun Gwan-gyeong <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* draw: drop some overflow computationsRoland Scheidegger2016-11-211-65/+46
| | | | | | | | | | | | | | | | | | | | | | | It turns out that noone actually cares if the address computations overflow, be it the stride mul or the offset adds. Wrap around seems to be explicitly permitted even by some other API (which is a _very_ surprising result, as these overflow computations were added just for that and made some tests pass at that time - I suspect some later fixes fixed the actual root cause...). So the requirements in that other api were actually sane there all along after all... Still need to make sure the computed buffer size needed is valid, of course. This ditches the shiny new widening mul from these codepaths, ah well... And now that I really understand this, change the fishy min limiting indices to what it really should have done. Which is simply to prevent fetching more values than valid for the last loop iteration. (This makes the code path in the loop minimally more complex for the non-indexed case as we have to skip the optimization combining two adds. I think it should be safe to skip this actually there, but I don't care much about this especially since skipping that optimization actually makes the code easier to read elsewhere.) Reviewed-by: Jose Fonseca <[email protected]>
* draw: simplify fetch some moreRoland Scheidegger2016-11-211-63/+55
| | | | | | | | | | | Don't keep the ofbit. This is just a minor simplification, just adjust the buffer size so that there will always be an overflow if buffers aren't valid to fetch from. Also, get rid of control flow from the instanced path too. Not worried about performance, but it's simpler and keeps the code more similar to ordinary fetch. Reviewed-by: Jose Fonseca <[email protected]>