summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* vc4: When the create ioctl fails, free our cache and try again.Eric Anholt2015-11-041-5/+24
| | | | | | | | | This greatly increases the pressure you can put on the driver before create fails. Ultimately we need to let the kernel take control of our cached BOs and just take them from us (and other clients) directly, but this is a very easy patch for the moment. Cc: "11.0" <[email protected]>
* vc4: Print the rounded shader size in debug output.Eric Anholt2015-11-041-1/+1
| | | | | It's surprising to see "0kb" printed for debug on short shaders, while 4kb alignment won't be suprising.
* vc4: Fix dumping the size of BOs allocated/cached.Eric Anholt2015-11-041-2/+2
| | | | 60MB of cached BOs are a lot less scary than 600MB.
* mesa/tests: add glBufferStorageEXT to ES 3.1 dispatch listIlia Mirkin2015-11-041-0/+3
| | | | | | | | | | I thought that aliased functions didn't need to be added, but that might only be if the function aliases something in the same {desktop,ES} space. Resolves the dispatch sanity test failure. Fixes: 13b19aa81 (mesa: expose support for GL_EXT_buffer_storage) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92824 Signed-off-by: Ilia Mirkin <[email protected]>
* vbo: fix another GL_LINE_LOOP bugBrian Paul2015-11-042-2/+10
| | | | | | | | | | | | | | | | | Very long line loops which spanned 3 or more vertex buffers were not handled correctly and could result in stray lines. The piglit lineloop test draws 10000 vertices by default, and is not long enough to trigger this. Even 'lineloop -count 100000' doesn't trigger the bug. For future reference, the issue can be reproduced by changing Mesa's VBO_VERT_BUFFER_SIZE to 4096 and changing the piglit lineloop test to use glVertex2f(), draw 3 loops instead of 1, and specifying -count 1023. Acked-by: Sinclair Yeh <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* svga: implement 'white_fragments' option for VGPU10 fragment shadersBrian Paul2015-11-041-5/+30
| | | | | | | | | | When we emulate XOR logicop mode with blend-subtract, we need to ensure that the fragment shader always emits white. We had this implemented for VGPU9, but not VGPU10. VMware bug 1545492. Reviewed-by: Charmaine Lee <[email protected]>
* u_vbuf: minor code reformatting / line wrappingBrian Paul2015-11-041-4/+8
| | | | Trivial.
* u_vbuf: add some const qualifiersBrian Paul2015-11-041-2/+2
| | | | Trivial.
* svga: use new enum indices_mode typeBrian Paul2015-11-042-2/+4
| | | | Reviewed-by: Charmaine Lee <[email protected]>
* util/indices: replace #define tokens with enum typeBrian Paul2015-11-043-86/+91
| | | | | | To ease debugging in gdb. Reviewed-by: Charmaine Lee <[email protected]>
* i965: check inst->predicate when clearing flag_live at dead code eliminateAlejandro Piñeiro2015-11-042-2/+2
| | | | | | | Detected by Matt Turner while reviewing commit a59359ecd22154cc2b3f88bb8c599f21af8a3934 Reviewed-by: Matt Turner <[email protected]>
* gallivm: fix sampling for s3tc srgb formats when using texture cacheRoland Scheidegger2015-11-041-1/+3
| | | | | | | | | This actually stored the values as 8bit linear values in the cache, then did another srgb->linear conversion... We don't want to do the former (decoding 8bit srgb values to 8bit linear completely defeats the purpose of srgb in the first place), so just decode to 8bit srgb. Fixes piglit.spec.ext_texture_srgb.texwrap formats-s3tc tests.
* i965/meta: Assert fast clears and rep clears never overlapBen Widawsky2015-11-031-0/+2
| | | | | | | | | | | There is nothing wrong with the code today, but as one modifies the code it turns out to be not too difficult to mess up the code, and this easy assertion should catch such driver implementation failures quickly. Cc: Kristian Høgsberg <[email protected]> Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Neil Roberts <[email protected]>
* mesa: expose support for GL_EXT_buffer_storageRyan Houdek2015-11-043-0/+11
| | | | | | | | | This extension requires ES 3.1 since it relies on glMemoryBarrier. For testing purposes I temporarily moved glMemoryBarrier to be an ES 3.0 function. This has been tested with the piglit in the ML and the Dolphin emulator. Reviewed-by: Ilia Mirkin <[email protected]>
* glsl: make sure to only add subroutines to resource listTimothy Arceri2015-11-041-1/+2
| | | | | | Over looked in 763cd8c080353. Reviewed-by: Tapani Pälli <[email protected]>
* glsl: remove old TODOTimothy Arceri2015-11-041-5/+0
| | | | | | | SSBO support now exists as of commits f24e5e and f408a13dd30. Reviewed-by: Tapani Pälli <[email protected]> Acked-by: Matt Turner <[email protected]>
* docs: Mark AoA as done for i965Timothy Arceri2015-11-042-2/+3
| | | | Reviewed-by: Ian Romanick <[email protected]>
* i965: enable ARB_arrays_of_arraysTimothy Arceri2015-11-041-0/+1
| | | | Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* i965: add support for image AoATimothy Arceri2015-11-042-14/+18
| | | | | | | | | | | | | V3: clamp array index to the correct size (the size of the current array rather than the inner array) Francisco Jerez. V2: avoid useless zero-initialization and addition for the first AoA level, avoid redundant temporary, make use of type_size_scalar(), rename aoa_size to element_size, assign the indirect indexing temporary directly to image.reladdr, and replace while loop with a for loop. All suggested by Francisco Jerez. Reviewed-by: Francisco Jerez <[email protected]>
* llvmpipe: add cache for compressed texturesRoland Scheidegger2015-11-0419-18/+730
| | | | | | | | | | | | | | | | | | | | | | compressed textures are very slow because decoding is rather complex (and because there's no jit code code to decode them too for non-technical reasons). Thus, add some texture cache which holds a couple of decoded blocks. Right now this handles only s3tc format albeit it could be extended to work with other formats rather trivially as long as the result of decode fits into 32bit per texel (ideally, rgtc actually would decode to more than 8 bits per channel, but even then making it work for it shouldn't be too difficult). This can improve performance noticeably but don't expect wonders (uncompressed is unsurprisingly still faster). It's also possible it might be slower in some cases (using nearest filtering for example or if there's otherwise not many cache hits, the cache is only direct mapped which isn't great). Also, actual decode of a block relies on util code, thus even though always full blocks are decoded it is done texel by texel - this could obviously benefit greatly from simd-optimized code decoding full blocks at once... Note the cache is per (raster) thread, and currently only used for fragment shaders. Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: use simple coeffs calc for 128bit vectorsOded Gabbay2015-11-041-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are currently two methods in llvmpipe code to calculate coeffs to be used as inputs for the fragment shader. The two methods use slightly different ways to do the floating point calculations and thus produce slightly different results. The decision which method to use is determined by the size of the vector that is used by the platform. For vectors with size of more than 128bit, a single-step method is used, in which coeffs_init_simple() + attribs_update_simple() are called. For vectors with size of 128bit or less, a two-step method is used, in which coeffs_init() + attribs_update() are called. This causes some piglit tests (clip-distance-bulk-copy, interface-vs-unnamed-to-fs-unnamed) to fail when using platforms with 128bit vectors (such as ppc64le or x86-64 without AVX). This patch makes platforms with 128bit vectors use the single-step method (aka "simple" method) instead of the two-step method. This would make the resulting coeffs identical between more platforms, make sure the piglit tests passes, and make debugging and maintainability a bit easier as the generated LLVM IR will be the same for more platforms. The performance impact is negligible for x86-64 without AVX, and basically non-existent for ppc64le, as it can be seen from the following benchmarking results: - glxspheres, on ppc64le: - original code: 4.892745317 frames/sec 5.460303857 Mpixels/sec - with the patch: 4.932083873 frames/sec 5.504205571 Mpixels/sec - Additional 0.8% performance boost - glxspheres, on x86-64 without AVX: - original code: 20.16418809 frames/sec 22.50323395 Mpixels/sec - with the patch: 20.31328989 frames/sec 22.66963152 Mpixels/sec - Additional 0.74% performance boost - glmark2, on ppc64le: - original code: score of 58 - with my change: score of 57 - glmark2, on x86-64 without AVX: - original code: score of 175 - with the patch: score of 167 - Impact of of -4.5% on performance - OpenArena, on ppc64le: - original code: 3398 frames 1719.0 seconds 2.0 fps 255.0/505.9/2773.0/0.0 ms - with the patch: 3398 frames 1690.4 seconds 2.0 fps 241.0/497.5/2563.0/0.2 ms - 29 seconds faster with the patch, which is about 2% - OpenArena, on x86-64 without AVX: - original code: 3398 frames 239.6 seconds 14.2 fps 38.0/70.5/719.0/14.6 ms - with the patch: 3398 frames 244.4 seconds 13.9 fps 38.0/71.9/697.0/14.3 ms - 0.3 fps slower with the patch (about 2%) Additional details can be found at: http://lists.freedesktop.org/archives/mesa-dev/2015-October/098635.html Signed-off-by: Oded Gabbay <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* nir: Properly invalidate metadata in nir_opt_remove_phis().Kenneth Graunke2015-11-031-0/+5
| | | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]> Cc: [email protected]
* nir: Properly invalidate metadata in nir_lower_vec_to_movs().Kenneth Graunke2015-11-031-0/+5
| | | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]> Cc: [email protected]
* nir: Properly invalidate metadata in nir_opt_copy_prop().Kenneth Graunke2015-11-031-0/+6
| | | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]> Cc: [email protected]
* nir: Properly invalidate metadata in nir_remove_dead_variables().Kenneth Graunke2015-11-031-2/+8
| | | | | | | | v2: Preserve live_variables too (Jason). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]>
* nir: Properly invalidate metadata in nir_split_var_copies().Kenneth Graunke2015-11-031-0/+5
| | | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]> Cc: [email protected]
* nir: Properly invalidate metadata in nir_lower_global_vars_to_local().Kenneth Graunke2015-11-031-0/+3
| | | | | | | | v2: Preserve nir_metadata_live_variables as well (caught by Jason). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]>
* nir: Unexpose _impl versions of copy_prop and dceJason Ekstrand2015-11-033-4/+2
| | | | | Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: rename UniformBlockStageIndex to InterfaceBlockStageIndexJordan Justen2015-11-037-21/+22
| | | | | | | | Signed-off-by: Jordan Justen <[email protected]> Cc: Samuel Iglesias Gonsálvez <[email protected]> Cc: Iago Toral <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Juha-Pekka Heikkila <[email protected]>
* i965/vec4: Send from GRF in atomic operations.Matt Turner2015-11-031-12/+18
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* gallium/radeon: allow returning SDMA fences from pipe->flushMarek Olšák2015-11-041-8/+56
| | | | | | | pipe->flush never returned SDMA fences. This fixes it. This is only an issue on amdgpu where fences can signal out of order. Reviewed-by: Michel Dänzer <[email protected]>
* gallium/radeon: always return the last SDMA fence on SDMA flush if neededMarek Olšák2015-11-042-4/+8
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* i965: Add scalar geometry shader support.Kenneth Graunke2015-11-035-24/+666
| | | | | | | | | | | | | | | | | | | | This is hidden behind INTEL_SCALAR_GS=1 for now, as we don't yet support instanced geometry shaders, and Orbital Explorer's shader spills like crazy. But the infrastructure is in place, and it's largely working. v2: Lots of rebasing. v3: (feedback from Kristian Høgsberg) - Handle stride and subreg_offset correctly for ATTRs; use a helper. - Fix missing emit_shader_time_end() call. - Delete dead code after early EOT in static vertex case to avoid tripping asserts in emit_shader_time_end(). - Use proper D/UD type in intexp2(). - Fix "EndPrimitve" and "to that" typos. - Assert that invocations == 1 so we know this is missing. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965: Add scalar GS input lowering code.Kenneth Graunke2015-11-031-5/+39
| | | | | | | | | We really ought to compute the VUE map at link time and stash it, rather than recomputing it here, but with the mess of program structures I wasn't sure where to put it. We can improve that later. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965: Fix the fs_visitor GS constructor to take shader_time_index.Kenneth Graunke2015-11-032-3/+5
| | | | | | | | Jason reworked this so it isn't simply ST_GS anymore...it's either -1 (not enabled) or an actual offset. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/gen8+: Extract color clear surface stateBen Widawsky2015-11-031-6/+15
| | | | | | | | | | | | | | On future generation platforms the color clear value is stored elsewhere in the surface state. By extracting this logic, we can cleanly implement the difference in an upcoming patch. Should have no functional impact. v2: Move hunk from the next patch into this patch (Matt) Whitespace fix (Ben) Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Neil Roberts <[email protected]>
* i965/gen8+: Remove redundant zeroing of surface stateBen Widawsky2015-11-031-12/+0
| | | | | | | | | | | | | | The allocate_surface_state already zeroes out the surface state, and doing it later in the function is destructive for what we want to accomplish when we split out support for gen9 fast clears (next patch). NOTE: Only dword 12 actually needed to be fixed, but it seemed more consistent to remove the other instances as well. I can make an argument both ways (open coding it, vs. not). I can rework the next patch if requires. Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Neil Roberts <[email protected]>
* nvc0: add missing compute parameters required by cloverSamuel Pitoiset2015-11-031-1/+10
| | | | | | | This fixes crashes with some piglit OpenCL tests. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: handle NULL pointer in nvc0_get_compute_param()Samuel Pitoiset2015-11-031-24/+21
| | | | | | | | | To get the size (in bytes) of a compute parameter, clover first calls get_compute_param() with a NULL data pointer. The RET() macro is based on nv50. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* i965/skl: PCI ID cleanup and brand stringsBen Widawsky2015-11-031-15/+19
| | | | | | | | | | | | A few new PCI ids are added here, and one is removed (0x190B) because it no longer seems to exist anywhere. v2-4: Only use ascii characters (Ilia) 0x1921 is no longer marked as f Reviewed-by: Jordan Justen <[email protected]> Signed-off-by: Ben Widawsky <[email protected]>
* i965/skl: Add GT4 PCI IDsBen Widawsky2015-11-032-1/+9
| | | | | | | | | | | | | | | | | | | | Like other gen8+ hardware, the hardware automatically scales up thread counts. We must be careful about the URB sizes since GT4 adds another slice. One of the existing PCI IDs is actually mislabeled as GT3. Arguably this is a real bug since the URB size will be wrong. Because this patch is simply meant to add the missing IDs, that will be fixed in a later patch. v2: No longer relevant. v3: Update the wm thread count to support GT4. The WM thread count is used to determine the maximum scratch space required. Currently the code always allocates the maximum amount even though lower GT SKUs require less. The formula is threads_per_psd * subslices_per_slice * slices Cc: [email protected] Reviewed-by: Jordan Justen <[email protected]> Signed-off-by: Ben Widawsky <[email protected]>
* mesa: Add spec citations for DispatchCompute*Jordan Justen2015-11-021-5/+29
| | | | | | | | | | | | | | | Note: The OpenGL 4.3 - 4.5 specification language for DispatchCompute appears to have an error regarding the max allowed values. When adding the specification citation, we note why the code does not match the specification language. v2: * Updates based on review from Iago Signed-off-by: Jordan Justen <[email protected]> Cc: Iago Toral Quiroga <[email protected]> Cc: Marta Lofstedt <[email protected]> Reviewed-by: Marta Lofstedt <[email protected]>
* mesa: Update DispatchComputeIndirect errors for indirect parameterJordan Justen2015-11-021-6/+5
| | | | | | | | | | | | | | | | | | | | | | | There is some discrepancy between the return values for some error cases for the DispatchComputeIndirect call in the ARB_compute_shader specification. Regarding the indirect parameter, in one place the extension spec lists that the error returned for invalid values should be INVALID_OPERATION, while later it specifies INVALID_VALUE. The OpenGL 4.3 and OpenGLES 3.1 specifications appear to be consistent in requiring the INVALID_VALUE error return in this case. Here we update the code to match the main specifications, and update the citations use the main specification rather than the extension specification. v2: * Updates based on review from Iago Signed-off-by: Jordan Justen <[email protected]> Cc: Iago Toral Quiroga <[email protected]> Cc: Marta Lofstedt <[email protected]> Reviewed-by: Marta Lofstedt <[email protected]>
* i965/fs: Clean up FBH code.Matt Turner2015-11-021-4/+3
| | | | Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: Clean up FBH code.Matt Turner2015-11-021-13/+5
| | | | | | It did a bunch of unnecessary stuff, emitting an extra MOV included. Reviewed-by: Ian Romanick <[email protected]>
* i965: Replace default case with list of enum values.Matt Turner2015-11-025-26/+29
| | | | | | | If we add a new file type, we'd like to get warnings if it's not handled. Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: Don't disable channels in any/all comparisons.Matt Turner2015-11-021-42/+10
| | | | | | | | | | | | | | | | | | | | | | We've made a mistake in calling the Channel Enable bits "writemask", because they do more than control which channels of the destination are written -- they actually control which channels are enabled (surprise! surprise!) So, if we emit cmp.z.f0(8) null.xy<1>D g10<4,4,1>.xyzzD g2<0,4,1>.xyzzD mov(8) g12<1>.xUD 0x00000000UD (+f0.all4h) mov(8) g12<1>.xUD 0xffffffffUD where the CMP instruction has only .xy channel enables, it won't write the .zw channels of the flag register, which are of course read by the +f0.all4 predicate. We need to always emit CMP instructions whose flag result might be read by such a predicate with all channels enabled. Reviewed-by: Jason Ekstrand <[email protected]>
* mesa: fix uniforms calculation in glGetProgramivTapani Pälli2015-11-021-2/+12
| | | | | | | | | | | | | Since introduction of SSBO, UniformStorage contains not just uniforms but also buffer variables, this needs to be taken in to account when calculating active uniforms with GL_ACTIVE_UNIFORMS and GL_ACTIVE_UNIFORM_MAX_LENGTH. No Piglit regressions. Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* mesa: fix program resource queries for atomic counter buffersTapani Pälli2015-11-021-2/+26
| | | | | | | | | | | | | | gl_active_atomic_buffer contains index to UniformStorage, we need to calculate resource index for that gl_uniform_storage. Fixes following CTS tests: ES31-CTS.program_interface_query.atomic-counters ES31-CTS.program_interface_query.atomic-counters-one-buffer No Piglit regressions. Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Marta Lofstedt <[email protected]>
* glsl: join calculate_array_size() and calculate_array_stride()Juha-Pekka Heikkila2015-11-021-110/+80
| | | | | | | | | | | | These helpers are ran for same case the same loop. Here joined their operation so the loop is ran just once. Also fixed out-of-memory condition here. v2: Make the loop simpler to read as per Tapani's suggestion Signed-off-by: Juha-Pekka Heikkila <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> Tested-by: Tapani Pälli <[email protected]>