summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers
Commit message (Collapse)AuthorAgeFilesLines
* r600: Scale integer valued texture border colors to float (v2)Gert Wollny2018-07-251-1/+44
| | | | | | | | | | | | | | | | | | | | | | | | It seems the hardware always expects floating point border color values [0,1] for unsigned, and [-1,1] for signed texture component, regardless of pixel type, but the border colors are passed according to texture component type. Hence, before submitting the border color, convert and scale it these ranges accordingly. This doesn't seem to work for textures with 32 bit integer components though, here, it seems that the border color is always set to zero, regardless of the BORDER_COLOR_TYPE state set in Q_TEX_SAMPLER_WORD0_0. v2: Simplyfy logic as suggested by Roland Schneidegger Fixes: dEQP-GLES31.functional.texture.border_clamp.formats.compressed* dEQP-GLES31.functional.texture.border_clamp.formats.r* (non 32 bit integer) dEQP-GLES31.functional.texture.border_clamp.per_axis_wrap_mode.texture_2d* and a number of piglits out of piglit run gpu -t texture -t gather -t formats Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* nir: rename f2f16_undef to f2f16Karol Herbst2018-07-241-3/+3
| | | | | | | | | | | we need rounding modes on other conversions involving floats and it is easier to rename f2f16_undef than renaming all the other ones. v2: rebased on master Reviewed-by: Jason Ekstrand <[email protected]> Acked-by: Rob Clark <[email protected]> Signed-off-by: Karol Herbst <[email protected]>
* radeonsi: handle SI_FORCE_FAMILY earlyMarek Olšák2018-07-241-2/+1
| | | | before LLVM target machines are created
* forward precise-flag if supportedErik Faye-Lund2018-07-242-1/+5
| | | | | | | | | | | | | | | | New versions of virglrenderer supports the precise-flag, so let's forward it from TGSI if that's the case. This fixes a few dEQP-GLES31 tests: - dEQP-GLES31.functional.tessellation.common_edge.quads_equal_spacing_precise - dEQP-GLES31.functional.tessellation.common_edge.quads_fractional_even_spacing_precise - dEQP-GLES31.functional.tessellation.common_edge.quads_fractional_odd_spacing_precise - dEQP-GLES31.functional.tessellation.common_edge.triangles_equal_spacing_precise - dEQP-GLES31.functional.tessellation.common_edge.triangles_fractional_even_spacing_precise - dEQP-GLES31.functional.tessellation.common_edge.triangles_fractional_odd_spacing_precise Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radeonsi: fix pk2h breakageMarek Olšák2018-07-231-2/+5
|
* radeonsi: reduce LDS stalls by 40% for tessellationMarek Olšák2018-07-234-6/+14
| | | | | | | | 40% is the decrease in the LGKM counter (which includes SMEM too) for the GFX9 LSHS stage. This will make the LDS size slightly larger, but I wasn't able to increase the patch stride without corruption, so I'm increasing the vertex stride.
* radeonsi: Add debug option to enable LLVM GlobalISel (v2)Tom Stellard2018-07-232-0/+4
| | | | | | | | | | R600_DEBUG=gisel will tell LLVM to use GlobalISel rather than SelectionDAG for instruction selection. v2: mareko: move the helper to src/amd/common Signed-off-by: Marek Olšák <[email protected]> Reviewed-by: Tom Stellard <[email protected]>
* r600: enable tess_input_info for TESDave Airlie2018-07-231-14/+6
| | | | | | | | | | | There might be a nicer way to do this, but this is at least correct. This fixes: KHR-GL44.tessellation_shader.single.max_patch_vertices KHR-GL44.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_PatchVerticesIn Reviewed-By: Gert Wollny <[email protected]> Cc: [email protected]
* Revert "virgl: remove unused stride-arguments"Dave Airlie2018-07-244-5/+27
| | | | | | This reverts commit dc938b8398c0dafb60507e41685f7518b681c24d. This adds warnings in vtest, and possibly breaks it.
* virgl: add initial shader_storage_buffer_object support. (v2)Dave Airlie2018-07-249-0/+98
| | | | | | | | | | | This adds the guest side support for ARB_shader_storage_buffer_object. Co-authors: Gurchetan Singh <[email protected]> v2: move to using separate maximums (fixup macros) Reviewed-By: Gert Wollny <[email protected]>
* virgl: remove unused stride-argumentsErik Faye-Lund2018-07-234-27/+5
| | | | | | | | The IOCTLs doesn't pass this along, so computing them in the first place is kinda pointless. Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* radeonsi/nir: make use of nir_lower_load_const_to_scalar()Timothy Arceri2018-07-231-0/+2
| | | | | | | | | This allows NIR to CSE more operations. LLVM does this also so the impact is limited, however doing this in NIR allows other opts to make progress. For example some loops in Civilization Beyond Earth shaders are unrolled. Reviewed-by: Marek Olšák <[email protected]>
* Android: fix a missing nir_intrinsics.h errorChih-Wei Huang2018-07-211-0/+2
| | | | | | | | | | | | | | | | | | | | | The commit 76dfed8ae2d5 changed nir_intrinsics.h to be a generated header, but the corresponding dependency was not updated for Android. It causes the error: [ 0% 19/4336] target C: libmesa_pipe_radeonsi <= external/mesa/src/gallium/drivers/radeonsi/si_debug.c ... In file included from external/mesa/src/gallium/drivers/radeonsi/si_debug.c:25: In file included from external/mesa/src/gallium/drivers/radeonsi/si_pipe.h:28: In file included from external/mesa/src/gallium/drivers/radeonsi/si_shader.h:140: In file included from external/mesa/src/amd/common/ac_llvm_build.h:30: external/mesa/src/compiler/nir/nir.h:966:10: fatal error: 'nir_intrinsics.h' file not found ^~~~~~~~~~~~~~~~~~ 1 error generated. Fixes: 76dfed8ae2d5 ("nir: mako all the intrinsics") Signed-off-by: Chih-Wei Huang <[email protected]> Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Mauro Rossi <[email protected]>
* v3d: Fix incorrect handling of two fences created back-to-back.Eric Anholt2018-07-201-12/+31
| | | | | | | | | | | Recreating our context's syncobj with ALREADY_SIGNALED meant that if you created two fences in a row, then waiting on the second would succeed immediately. Instead, export a sync file in the gallium fence (since we don't have a syncobj clone ioctl), and just create a new syncobj to wait on whenever we need to. Noticed while debugging dEQP-GLES3.functional.fence_sync.client_wait_sync_finish
* v3d: Fix the timeout value passed to drmSyncobjWait().Eric Anholt2018-07-201-1/+6
| | | | | The API wants an absolute time, so we need to go add gallium's argument to CLOCK_MONOTONIC.
* v3d: Fix drmSyncobjWait() return value checking even more.Eric Anholt2018-07-201-1/+1
| | | | | | It tends to return >0 in the success case (I think the value is something like "how much of the timeout remained"). Fixes dEQP-GLES3.functional.fence_sync.client_wait_sync_finish
* v3d: Use the list_first_entry/list_last_entry macros.Eric Anholt2018-07-201-8/+8
|
* v3d: Move BO cache counting to dump time instead of cache management.Eric Anholt2018-07-202-9/+9
| | | | This is one less way to get the dump stats wrong.
* v3d: Reduce the stale BO reclamation spam with dump_stats set.Eric Anholt2018-07-201-6/+5
| | | | | This was obviously meant to be when we were actually freeing a BO, not just when there was at least one BO in the list.
* v3d: Respect a sampler view's first_layer field.Eric Anholt2018-07-201-1/+3
| | | | | Fixes texturing from EGL images created from cubemap faces, as in dEQP-EGL.functional.image.create.gles2_cubemap_negative_x_rgba_texture
* radeonsi: emit_spi_map packets optimizationSonny Jiang2018-07-204-8/+39
| | | | | | | | | v2: marek: remove an empty line before break; rename reg_val_seq -> spi_ps_input_cntl "type * x" -> "type *x" Signed-off-by: Sonny Jiang <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* virgl: Expose GL_ARB_copy_image if host supports itGert Wollny2018-07-202-1/+3
| | | | | Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* virgl: Allow RGB32* textures only as buffer objectsGert Wollny2018-07-201-0/+7
| | | | | | | | | | | | | | | | | | When requesting a texture of the internal format GL_RGB32F Gallium will try to allocate a renderable texture and returns RGBA32F or RGBX32F, but when one requests GL_RGB32I or GL_RGB32UI the according 3-component texture will be returned. This leads to problems later, when one wants to use glCopyImageSubData to copy data between these textures that should be compatible, but given the way virgl and Gallium handle this the latter fails with an assertion, because the per-texel bit size is different. By allowing the GL_RGB32* only for texture buffers these problems are avoided without losing the ARB_tbo_rgb32 extension (thanks Ilia Mirkin). v2: Correct spelling (Gurchetan Singh) Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* r600: Correct evaluation of cube array index and faceGert Wollny2018-07-201-1/+33
| | | | | | | | | | | | | | | | | | | | The array index needs to be corrected and it must be insured that it is rounded and its value is non-negative before it is combined with the face id. v5: Use RNDNE instead of ADD 0.5 and FLOOR (Ilia Mirkin) v6: Fix type (Roland Scheidegger) Fixes 182 from android/cts/master/gles31-master.txt: dEQP-GLES31.functional.texture.filtering.cube_array.formats.* dEQP-GLES31.functional.texture.filtering.cube_array.sizes.* dEQP-GLES31.functional.texture.filtering.cube_array.combinations.nearest_mipmap_* dEQP-GLES31.functional.texture.filtering.cube_array.combinations.linear_mipmap_* dEQP-GLES31.functional.texture.filtering.cube_array.no_edges_visible.* Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* r600: correct texture offset for array index lookupGert Wollny2018-07-201-5/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Correct the array index for TEXTURE_*1D_ARRAY, and TEXTURE_*2D_ARRAY The standard says the array index is evaluated according to floor(z + 0.5) but RNDNE is sufficient also for the test cases were z is close to 1.5 and it is likely to hit 1.5, the corner case were RNDNE gives a result different from above formula. v5: - Use RNDNE instead of ADD 0.5 and FLOOR (Ilia Mirkin) - update commit message Fixes 325 tests from android/cts/master/gles3-master.txt: dEQP-GLES3.functional.shaders.texture_functions.texture.*sampler2darray* dEQP-GLES3.functional.shaders.texture_functions.textureoffset.*sampler2darray* dEQP-GLES3.functional.shaders.texture_functions.texturelod.sampler2darray* dEQP-GLES3.functional.shaders.texture_functions.texturelodoffset.*sampler2darray* dEQP-GLES3.functional.shaders.texture_functions.texturegrad.*sampler2darray* dEQP-GLES3.functional.shaders.texture_functions.texturegradoffset.*sampler2darray* dEQP-GLES3.functional.texture.filtering.2d_array.formats.* dEQP-GLES3.functional.texture.filtering.2d_array.sizes.* dEQP-GLES3.functional.texture.filtering.2d_array.combinations.* dEQP-GLES3.functional.texture.shadow.2d_array.* dEQP-GLES3.functional.texture.vertex.2d_array.* Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* r600: Delay emission of texture gradients and lookup offsetsGert Wollny2018-07-201-44/+48
| | | | | | | | | | | | | | Gradients used in texture lookups and the offsets must reside in the same fetch clause (the first is imposed by the hardware and the second is expected by sb). In order to ensure that no ALU clause is inserted between emission and use of these, delay the emission of these instructions until the texture instruction using them is also emitted. This is needed in preparation for the correction of the texture array indices. Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* nv50/ir: move LateAlgebraicOpt back to right after ConstantFoldingRhys Perry2018-07-191-1/+1
| | | | | | | | | | | | total instructions in shared programs : 5480808 -> 5472107 (-0.16%) total gprs used in shared programs : 647530 -> 647532 (0.00%) total shared used in shared programs : 389120 -> 389120 (0.00%) total local used in shared programs : 21064 -> 21064 (0.00%) total bytes used in shared programs : 58551648 -> 58459352 (-0.16%) local shared gpr inst bytes helped 0 0 73 2609 2609 hurt 0 0 71 34 34
* nv50/ir: handle SHLADD in IndirectPropagationRhys Perry2018-07-191-0/+12
| | | | | | | | | | | | | | | An alternative solution to the problem fixed in 0bd83d0 ("nv50/ir: move LateAlgebraicOpt to the very end"). total instructions in shared programs : 5481195 -> 5480808 (-0.01%) total gprs used in shared programs : 647535 -> 647530 (-0.00%) total shared used in shared programs : 389120 -> 389120 (0.00%) total local used in shared programs : 21064 -> 21064 (0.00%) total bytes used in shared programs : 58555784 -> 58551648 (-0.01%) local shared gpr inst bytes helped 0 0 2 34 34 hurt 0 0 0 0 0
* gm107/ir: use CS2R for SV_CLOCKRhys Perry2018-07-193-2/+25
| | | | | | | | This instruction seems to be faster than S2R and requires no barrier, though the range of special registers it can read from is limited. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
* r600: silence the signed overflow warning like radeonsiMarek Olšák2018-07-181-1/+1
| | | | | | r600_gpu_load.c: In function ‘r600_gpu_load_thread’: ../../../../src/util/os_time.h:82:7: warning: assuming signed overflow does not occur when assuming that (X + c) >= X is always true [-Wstrict-overflow] if (start <= end)
* radeonsi: emit_guardband packets optimizationSonny Jiang2018-07-184-8/+50
| | | | | Signed-off-by: Sonny Jiang <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* radeonsi: Save CLEAR_STATE initial values for optimizationSonny Jiang2018-07-181-2/+26
| | | | | Signed-off-by: Sonny Jiang <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* radeonsi: Refuse to accept code with unhandled relocationsJan Vesely2018-07-181-0/+6
| | | | | | | They might lead to unrecoverable GPU hang. Signed-off-by: Jan Vesely <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Cc: [email protected]
* v3d: Fix tiling modifier support to use the new UIF define.Eric Anholt2018-07-181-3/+16
| | | | | You can't use T tiled buffers on V3D 3.x and newer, it's been replaced with a newer layout shared with other hardware blocks.
* radeonsi: Use signed char for color_interp_vgpr_indexTimothy Pearson2018-07-181-1/+1
| | | | | | | | | | | | color_interp_vgpr_index was declared as a generic char value. Because signed values are used in this variable, the result was not safe across architectures and crashed on ppc64[el] and arm. Declare color_interp_vgpr_index as a signed type. Signed-off-by: Timothy Pearson <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* freedreno/a5xx: perfmance countersRob Clark2018-07-187-1/+982
| | | | | | AMD_performance_monitor support Signed-off-by: Rob Clark <[email protected]>
* freedreno: batch query support (perfcounters)Rob Clark2018-07-187-3/+153
| | | | | | | Core infrastructure for performance counters, using gallium's batch query interface (to support AMD_performance_monitor). Signed-off-by: Rob Clark <[email protected]>
* freedreno: batch query prep-workRob Clark2018-07-182-8/+28
| | | | | | | | | | | | For batch queries we have N different query_type's for one query, so mapping a single query_type to a sample_provider doesn't really work out. Instead add a new constructor to construct a query directly from a sample_provider. Also, the sample buffer size needs to be determined at runtime, as it depends on the number of query_types. Signed-off-by: Rob Clark <[email protected]>
* freedreno: rework accumulated query result vfuncRob Clark2018-07-183-9/+9
| | | | | | | | Take the query object, rather than the ctx. The ctx ptr isn't hugely useful but for back queries we will need the query object to properly get the results. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: output ir3 and nir asm for frameretraceRob Clark2018-07-184-0/+69
| | | | | | See: https://github.com/janesma/apitrace/commit/298dc8195bf082fe1f47aa474e28411f85dd5393 Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: redirectable ir3 disasm outputRob Clark2018-07-183-50/+48
| | | | | | | | | | | For now it still goes to stdout, this will make it easier to support output on stderr like what frameretrace expects. (If we eventually have a proper GL extension for this, implementation probably looks like dumping shader disasm to a tmp file and then dumping that out over whatever mechanism is used.) Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: resync ir3 disassemblerRob Clark2018-07-184-191/+209
| | | | | | | Pull in latest updates from cffdump in envytools tree, so we can output to other than just stdout. Signed-off-by: Rob Clark <[email protected]>
* freedreno: register usage queriesRob Clark2018-07-188-22/+91
| | | | | | | Avg number of (half) regs per draw, so we can corrolate fps dips to shader register usage. Signed-off-by: Rob Clark <[email protected]>
* nir: add lowering for gl_HelperInvocationRob Clark2018-07-182-0/+2
| | | | | | | | | v2: reword comment about lower_helper_invocations to be more clear that it might not work on all hardware v3: add special variant of load_sample_id which does not imply per- sample shading Signed-off-by: Rob Clark <[email protected]>
* r600: fix warnings when unref'ing pool->boMarek Olšák2018-07-171-3/+3
|
* r600g: some -Wsign-compare fixesKonstantin Kharlamov2018-07-176-14/+13
| | | | | Signed-off-by: Konstantin Kharlamov <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* r600g: constify some variablesKonstantin Kharlamov2018-07-175-10/+10
| | | | | | | Just a nice hint for both peoples and compilers. Signed-off-by: Konstantin Kharlamov <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* r600g: do not use "fast-clear" for small textures (v3)Konstantin Kharlamov2018-07-171-0/+10
| | | | | | | | | | | | | | | | | | | | Ported from radeonsi. Improves windowed glxgears ran as vblank_mode=0 glxgears -info -geometry 0+0+512+512 from ≈2270 FPS to ≈2360 FPS. Tested with AMD TURKS. v2: turned out glxgears ignores the option above, the correct way would be "512x512+0+0". Now it can be seen 512x512 actually loses 30 FPS. 300×300 however wins around a hundred FPS, and to leave some room in case results may differ for other cards I want not to nitpick in search of an optimum but to simply leave 300×300 in the code. v3: remove redundant braces, and try harder for the mail to stick to the rest of the series. Signed-off-by: Konstantin Kharlamov <[email protected]> Reviewed-by: Gert Wollny <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* freedreno: re-work fd_batch_reference() lockingRob Clark2018-07-172-23/+26
| | | | | | | | Annoyingly we still have to briefly drop the lock to unref resources.. but push the lock down into __fd_batch_destroy() so we can invalidate the batch and reset resources before dropping the lock. Signed-off-by: Rob Clark <[email protected]>
* freedreno: make fd_batch a one-shot thingRob Clark2018-07-172-11/+36
| | | | | | | | | | | | | Re-allocate rather than re-use. Originally we had an unnecessarily complex design to avoid re-allocating cmdstream buffers. But now that support for "growable" cmdstream buffers has been in place for a couple years, I guess we can care a bit less about the extra overhead on older kernels. But making the batches one-shot removes a class of potential race conditions vs the flush_queue. Signed-off-by: Rob Clark <[email protected]>