summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* nvc0: do not bind input params at compute state init on FermiSamuel Pitoiset2015-10-181-8/+0
| | | | | | | | | | | | | | | | | | It looks like binding a constant buffer on compute overwrites the 3D state. To avoid that, we already re-bind all the 3D constant buffers after launching a compute grid but this is not enough. Binding the constant buffer of input parameters for the compute state at initialization corrupts the 3D constant buffers, and it's just useless to bind it because this is not needed until we really launch a grid. This fixes some piglit regressions related to interpolation tests introduced in "nvc0: enable compute support by default on Fermi". Fixes: 00d6186 (nvc0: enable compute support by default on Fermi) Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* i965/vs: Drop hack that created NIR for fixed function vertex programs.Kenneth Graunke2015-10-171-12/+0
| | | | | | | | Marek made core Mesa call ProgramStringNotify(), which solves this properly. The hack is no longer needed. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/nir: Switch on shader stage in nir_lower_outputs().Kenneth Graunke2015-10-171-5/+21
| | | | | | | | | | | VS, GS, and FS continue doing the same thing they did before. We can simplify the FS code a bit because it is always scalar. Compute shaders now assert that there are no outputs instead of doing a loop over 0 outputs. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* radeonsi: don't use the AMDGPU intrinsic for CMPMarek Olšák2015-10-171-9/+22
| | | | | | | No difference according to shader-db. Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Tom Stellard <[email protected]>
* radeonsi: use LRP from gallivmMarek Olšák2015-10-171-2/+0
| | | | | | | | | | | | | | | | | | Totals: SGPRS: 344552 -> 344368 (-0.05 %) VGPRS: 197132 -> 197552 (0.21 %) Code Size: 7375376 -> 7366304 (-0.12 %) bytes LDS: 91 -> 91 (0.00 %) blocks Scratch: 1679360 -> 1615872 (-3.78 %) bytes per wave Totals from affected shaders: SGPRS: 47736 -> 47552 (-0.39 %) VGPRS: 27952 -> 28372 (1.50 %) Code Size: 1392724 -> 1383652 (-0.65 %) bytes LDS: 39 -> 39 (0.00 %) blocks Scratch: 513024 -> 449536 (-12.38 %) bytes per wave Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: don't emit AMDGPU intrinsics for integer abs, min, maxMarek Olšák2015-10-171-10/+50
| | | | | | | No difference according to shader-db. (with the new S_ABS_I32 pattern) Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Tom Stellard <[email protected]>
* radeonsi: don't emit AMDGPU intrinsics for EX2, ROUND, TRUNCMarek Olšák2015-10-171-3/+3
| | | | | | | No difference according to shader-db. Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Tom Stellard <[email protected]>
* radeonsi: initialize output, temp, and address registers to "undef"Marek Olšák2015-10-171-4/+15
| | | | | | | | | | | | | | | | | | | | | This removes "v_mov v0, 0" which typically occurs before exports. Totals: SGPRS: 345216 -> 344552 (-0.19 %) VGPRS: 197684 -> 197132 (-0.28 %) Code Size: 7390408 -> 7375376 (-0.20 %) bytes LDS: 91 -> 91 (0.00 %) blocks Scratch: 1842176 -> 1679360 (-8.84 %) bytes per wave Totals from affected shaders: SGPRS: 101336 -> 100672 (-0.66 %) VGPRS: 53920 -> 53368 (-1.02 %) Code Size: 2170176 -> 2155144 (-0.69 %) bytes LDS: 2 -> 2 (0.00 %) blocks Scratch: 1015808 -> 852992 (-16.03 %) bytes per wave Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Tom Stellard <[email protected]>
* gallivm: implement the correct version of LRPMarek Olšák2015-10-171-6/+13
| | | | | | | | | The previous version has precision issues. This can be a problem with tessellation. Sadly, I can't find the article where I read it anymore. I'm not sure if the unsafe-fp-math flag would be enough to revert this. v2: added the comment
* gallivm: set correct opcode info from unary/binary/ternary emitsMarek Olšák2015-10-171-3/+6
| | | | | | | | and clear the emit_data structure. The new radeonsi min/max opcode implementation requires this. (it looks good according to Roland S.)
* radeonsi: implement vertex color clampingMarek Olšák2015-10-175-4/+52
| | | | | | This is only supported in the compatibility profile (without GS and tess). Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: implement fragment color clampingMarek Olšák2015-10-176-2/+18
| | | | | | using the shader key for now. Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: clean up other scratch buffer functionsMarek Olšák2015-10-171-15/+8
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: clean up copy-pasted scratch buffer updatesMarek Olšák2015-10-171-26/+13
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: unify shader create functionsMarek Olšák2015-10-171-40/+9
| | | | | | The shader specifies the processor type, so use that instead. Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: unify shader delete functionsMarek Olšák2015-10-171-67/+17
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: fix a GS copy shader leakMarek Olšák2015-10-171-1/+3
| | | | | Cc: [email protected] Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: remove an unused ctx parameter in si_shader_destroyMarek Olšák2015-10-174-6/+6
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: print export_prim_id from the shader keyMarek Olšák2015-10-171-0/+2
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: disable NaNs for LS and HSMarek Olšák2015-10-171-2/+4
| | | | | | | They're disabled for all other shaders except compute, but I forgot to do this for tess stages. Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: clean up si_llvm_init_export_argsMarek Olšák2015-10-171-42/+35
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* tgsi: move pipe_shader_from_tgsi_processor function to utilMarek Olšák2015-10-172-24/+24
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* mesa: remove FLUSH_VERTICES() in _mesa_MatrixMode()Brian Paul2015-10-171-1/+0
| | | | | | | | Changing the matrix mode alone has no effect on rendering and does not need to trigger a flush or state validation. Reviewed-by: Eric Anholt <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* st/mesa: fix clip state dependenciesMarek Olšák2015-10-171-1/+4
| | | | | | | This allows removing FLUSH_VERTICES in MatrixMode. Cc: [email protected] Reviewed-by: Brian Paul <[email protected]>
* gallium/hud: fix possible NULL pointer dereferenceMarek Olšák2015-10-171-0/+3
| | | | Trivial.
* scons: fix MSVC, MinGW buildBrian Paul2015-10-174-2/+21
| | | | Duplicate the glsl_types_hack.cpp work-around from the libgl-xlib target.
* build: fix make-check after a6a6a71Rob Clark2015-10-171-0/+5
| | | | | | | | | | | | | commit a6a6a71092ba912803ae2b47eb56e3afdf36feb5 Author: Rob Clark <[email protected]> AuthorDate: Sat Oct 10 14:13:50 2015 -0400 glsl: (mostly) remove libglsl_util Was a bit too ambitious on removal of libglsl_util.. it is still needed by some of the tests. Signed-off-by: Rob Clark <[email protected]>
* build: fix out-of-tree build after b9b40efRob Clark2015-10-171-0/+1
| | | | | | | | | | | | commit b9b40ef9b7644ea24768bc8b7464b1719efe99bf Author: Rob Clark <[email protected]> AuthorDate: Sat Oct 10 13:55:07 2015 -0400 nir: remove dependency on glsl broke things for i965 out of tree build. Signed-off-by: Rob Clark <[email protected]>
* nvc0: add support for performance monitoring metrics on FermiSamuel Pitoiset2015-10-174-3/+500
| | | | | | | | | As explained in the CUDA toolkit documentation, "a metric is a characteristic of an application that is calculated from one or more event values." Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* glsl: (mostly) remove libglsl_utilRob Clark2015-10-165-11/+1
| | | | | | | | | | | Now that NIR does not depend on glsl, we can (mostly[*]) get rid of the libglsl_util hack. [*] glsl_compiler is the one remaining user of libglsl_util Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* nir: remove dependency on glslRob Clark2015-10-1629-19/+34
| | | | | | | | | | | | | | | Move glsl_types into NIR, now that the dependency on glsl_symbol_table has been split out. Possibly makes sense to rename things at this point, but if we do that I'd like to keep it split out into a separate patch to make git history easier to follow (IMHO). v2: fix android build v3: I f***ing hate scons.. but at least it builds Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* glsl: move half<->float convertion to utilRob Clark2015-10-1613-155/+228
| | | | | | | | Needed in NIR too, so move out of mesa/main/imports.c Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* glsl: move builtin vector types to glsl_types.cppRob Clark2015-10-162-3/+15
| | | | | | | | | | | First step at untangling NIR's dependency on glsl_types without bringing in the dependency on glsl_symbol_table. The builtin types are now in glsl_types (which will end up in NIR), but adding them to the symbol- table stays in builtin_types.cpp (which will not be part of NIR). Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* glsl: couple shader_enums cleanupsRob Clark2015-10-163-5/+15
| | | | | | | | | | | | Add missing enum to gl_system_value_name() and move VARYING_SLOT_MAX / FRAG_RESULT_MAX / etc into shader_enums.h as suggested by Emil. v2: add STATIC_ASSERT()'s Reported-by: Emil Velikov <[email protected]> Acked-by: Emil Velikov <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* glsl: initialise record array count to 1Timothy Arceri2015-10-171-0/+1
| | | | | | | | | This was only being done in one of the two process methods. Fixes an issue with samplers using the array size of a previous record. Tested-by: Marek Olšák <[email protected]> Cc: Jason Ekstrand <[email protected]>
* nir: add atomic lowering support for AoATimothy Arceri2015-10-171-10/+12
| | | | | Cc: Francisco Jerez <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: wrapper for glsl_type arrays_of_arrays_size()Timothy Arceri2015-10-172-0/+8
| | | | | Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* configure: show which gallium drivers/sts are builtIlia Mirkin2015-10-161-1/+11
| | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Michel Dänzer <[email protected]>
* tgsi: initialize ctx.file in tgsi_dump_instruction()Brian Paul2015-10-161-0/+1
| | | | | Fixes segfault because of uninitialized file pointer. Trivial.
* nvc0: add a note about MP counters on GF100/GF110Samuel Pitoiset2015-10-161-0/+5
| | | | | | | | | MP counters on GF100/GF110 (compute capability 2.0) are buggy because there is a context-switch problem that we need to fix. Results might be wrong sometimes, be careful! Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: add MP counters variants for GF100/GF110Samuel Pitoiset2015-10-162-77/+483
| | | | | | | | | GF100 and GF110 chipsets are compute capability 2.0, while the other Fermi chipsets are compute capability 2.1. That's why, some MP counters are different between these chipsets and we need to handle variants. Signed-off-by: Samuel Pitoiet <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: move SW/HW queries info to their respective filesSamuel Pitoiset2015-10-167-178/+228
| | | | | | | This will help for handling HW SM queries variants on Fermi. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: enable compute support by default on FermiSamuel Pitoiset2015-10-162-8/+2
| | | | | | | | | | Compute support was not enabled by default because weird effects on 3D state happened, but I can't reproduce them anymore. This also enables MP performance counters by default on Fermi. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: allow only one active query for the MP counters groupSamuel Pitoiset2015-10-161-11/+9
| | | | | | | | | | | | | | Because we can't expose the number of hardware counters needed for each different query, we don't want to allow more than one active query simultaneously to avoid failure when the maximum number of counters is reached. Note that these groups of GPU counters are currently only used by AMD_performance_monitor. Like for Kepler, this limits the maximum number of active queries to 1 on Fermi. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: read MP counters of all GPCs on FermiSamuel Pitoiset2015-10-161-1/+1
| | | | | | | | | | | | When a card has more than one GPC, the grid used by the compute kernel which reads MP performance counters seems to be too small. The consequence is that the kernel is not launched on all TPCs. Increasing the grid size using the number of GPCs now launches enough blocks and we can read MP performance counters of all TPCs. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: store the number of GPCs to nvc0_screenSamuel Pitoiset2015-10-162-0/+2
| | | | | | | | | | | | NOUVEAU_GETPARAM_GRAPH_UNITS param returns the number of GPCs, the total number of TPCs and the number of ROP units. Note that when the DRM version is too old the default number of GPCs is fixed to 4. This will be used to launch the compute kernel which is used to read MP performance counters over all GPCs. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: fix unaligned mem access when reading MP counters on FermiSamuel Pitoiset2015-10-161-6/+12
| | | | | | | | | | | | | Memory access have to be aligned to 128-bits. Note that this doesn't happen when the card only has TPC. This patch fixes the following dmesg fail: gr: GPC0/TPC1/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 000f [UNALIGNED_MEM_ACCESS] Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: fix monitoring multiple MP counters queries on FermiSamuel Pitoiset2015-10-161-76/+87
| | | | | | | | | For strange reasons, the signal id depends on the slot selected on Fermi but not on Kepler. Fortunately, the signal ids are just offseted by the slot id! Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: fix queries which use multiple MP counters on FermiSamuel Pitoiset2015-10-161-47/+81
| | | | | | | | | | | | | | Queries which use more than one MP counters was misconfigured and computing the final result was also wrong because sources need to be configured on different hardware counters instead. According to the blob, computing the result is now as follows: FOR i..n val += ctr[i] * pow(2, i) Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: allow to use 8 MP counters on FermiSamuel Pitoiset2015-10-162-19/+13
| | | | | | | | On Fermi, we have one domain of 8 MP counters while we have two domains of 4 MP counters on Kepler. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>