mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	spirv: Add an execution environment to the options	Caio Marcelo de Oliveira Filho	2019-03-23	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	Also updates gl_spirv to pick the right one. At the moment nothing uses it, but upcoming functionality part of ARB_gl_spirv will use it, and we also later can be more assertful when handling certain features for each of the execution environments. Reviewed-by: Alejandro Piñeiro <[email protected]> Acked-by: Karol Herbst <[email protected]>
*	mesa/st: use ESSL cap top enable gpu_shader5	Rob Clark	2019-03-22	1	-3/+14
\| \| \| \| \| \| \| \| \| \| \| \| \|	For GLES2+ contexts, enable EXT_gpu_shader5 if the driver exposes a sufficiently high ESSL feature level, even if the GLSL feature level isn't high enough. This allows drivers to support EXT_gpu_shader5 in GLES contexts before they support all the additional features of ARB_gpu_shader5 in GL contexts. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	mesa: Fix GL_NUM_DEVICE_UUIDS_EXT	Józef Kucia	2019-03-22	1	-0/+3
\| \| \| \| \|	Cc: [email protected] Reviewed-by: Tapani Pälli <[email protected]>
*	gallium: Add PIPE_BARRIER_UPDATE_BUFFER and UPDATE_TEXTURE bits.	Kenneth Graunke	2019-03-19	1	-15/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The glMemoryBarrier() function makes shader memory stores ordered with respect to things specified by the given bits. Until now, st/mesa has ignored GL_TEXTURE_UPDATE_BARRIER_BIT and GL_BUFFER_UPDATE_BARRIER_BIT, saying that drivers should implicitly perform the needed flushing. This seems like a pretty big assumption to make. Instead, this commit opts to translate them to new PIPE_BARRIER bits, and adjusts existing drivers to continue ignoring them (preserving the current behavior). The i965 driver performs actions on these memory barriers. Shader memory stores go through a "data cache" which is separate from the render cache and other read caches (like the texture cache). All memory barriers need to flush the data cache (to ensure shader memory stores are visible), and possibly invalidate read caches (to ensure stale data is no longer visible). The driver implicitly flushes for most caches, but not for data cache, since ARB_shader_image_load_store introduced MemoryBarrier() precisely to order these explicitly. I would like to follow i965's approach in iris, flushing the data cache on any MemoryBarrier() call, so I need st/mesa to actually call the pipe->memory_barrier() callback. Fixes KHR-GL45.shader_image_load_store.advanced-sync-textureUpdate and Piglit's spec/arb_shader_image_load_store/host-mem-barrier on the iris driver. Roland said this looks reasonable to him. Reviewed-by: Eric Anholt <[email protected]>
*	i965/icl: Add WA_2204188704 to disable pixel shader panic dispatch	Anuj Phogat	2019-03-19	2	-0/+10
\| \| \| \| \| \|	Signed-off-by: Anuj Phogat <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
*	st/mesa: stop using pipe_sampler_view_release()	Brian Paul	2019-03-17	2	-5/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	In all instances here we can replace pipe_sampler_view_release(pipe, view) with pipe_sampler_view_reference(view, NULL) because the views in question are private to the state tracker context. So there's no danger of freeing a sampler view with the wrong context. Testing done: google chrome, misc GL demos, games Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Neha Bhende <[email protected]> Reviewed-by: Mathias Fröhlich <[email protected]> Reviewed-By: Jose Fonseca <[email protected]>
*	st/mesa: implement "zombie" shaders list	Brian Paul	2019-03-17	3	-20/+166
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As with the preceding patch for sampler views, this patch does basically the same thing but for shaders. However, reference counting isn't needed here (instead of calling cso_delete_XXX_shader() we call st_save_zombie_shader(). The Redway3D Watch is one app/demo that needs this change. Otherwise, the vmwgfx driver generates an error about trying to destroy a shader ID that doesn't exist in the context. Note that if PIPE_CAP_SHAREABLE_SHADERS = TRUE, then we can use/delete any shader with any context and this mechanism is not used. Tested with: google-chrome, google earth, Redway3D Watch/Turbine demos and a few Linux games. Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Neha Bhende <[email protected]> Reviewed-by: Mathias Fröhlich <[email protected]> Reviewed-By: Jose Fonseca <[email protected]>
*	st/mesa: implement "zombie" sampler views (v2)	Brian Paul	2019-03-17	5	-4/+131
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When st_texture_release_all_sampler_views() is called the texture may have sampler views belonging to several contexts. If we unreference a sampler view and its refcount hits zero, we need to be sure to destroy the sampler view with the same context which created it. This was not the case with the previous code which used pipe_sampler_view_release(). That function could end up freeing a sampler view with a context different than the one which created it. In the case of the VMware svga driver, we detected this but leaked the sampler view. This led to a crash with google-chrome when the kernel module had too many sampler views. VMware bug 2274734. Alternately, if we try to delete a sampler view with the correct context, we may be "reaching into" a context which is active on another thread. That's not safe. To fix these issues this patch adds a per-context list of "zombie" sampler views. These are views which are to be freed at some point when the context is active. Other contexts may safely add sampler views to the zombie list at any time (it's mutex protected). This avoids the context/view ownership mix-ups we had before. Tested with: google-chrome, google earth, Redway3D Watch/Turbine demos a few Linux games. If anyone can recomment some other multi-threaded, multi-context GL apps to test, please let me know. v2: avoid potential race issue by always adding sampler views to the zombie list if the view's context doesn't match the current context, ignoring the refcount. Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Neha Bhende <[email protected]> Reviewed-by: Mathias Fröhlich <[email protected]> Reviewed-By: Jose Fonseca <[email protected]>
*	mesa: Add assert to _mesa_primitive_restart_index.	Mathias Fröhlich	2019-03-15	1	-0/+3
\| \| \| \| \| \| \|	Make sure the inde_size parameter is meant to be in bytes. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
*	vbo: Fix GL_PRIMITIVE_RESTART_FIXED_INDEX in display list compiles.	Mathias Fröhlich	2019-03-15	1	-5/+9
\| \| \| \| \| \| \| \|	The maximum value primitive restart index is different for each index data type. Use the appropriate fixed restart index value. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
*	vbo: Fix basevertex handling in display list compiles.	Mathias Fröhlich	2019-03-15	1	-5/+12
\| \| \| \| \| \| \| \| \|	The standard requires that the primitive restart comparison happens before the basevertex value is added. Do this now, drop a reference to the standard why this happens at this place. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
*	mesa: Use mapping tools in debug prints.	Mathias Fröhlich	2019-03-15	1	-45/+12
\| \| \| \| \|	Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
*	mesa: Remove _ae_{,un}map_vbos and dependencies.	Mathias Fröhlich	2019-03-15	2	-100/+0
\| \| \| \| \| \| \| \| \|	Since mapping and unmapping the buffer objects in a VAO is handled directly from the VAO, this part of the _NEW_ARRAY state is no longer used. So remove this part of array element state. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
*	mesa: Replace _ae_{,un}map_vbos with _mesa_vao_{,un}map_arrays	Mathias Fröhlich	2019-03-15	2	-13/+11
\| \| \| \| \| \| \| \| \| \|	Due to the use of bitmaps, the _mesa_vao_{,un}map_arrays functions should provide comparable runtime efficienty to the currently used _ae_{,un}map_vbos functions. So use this functions and enable further cleanup. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
*	mesa: Use _mesa_array_element in dlist save.	Mathias Fröhlich	2019-03-15	1	-4/+19
\| \| \| \| \| \| \| \| \| \|	Make use of the newly factored out _mesa_array_element function in display list compilation. For now that duplicates out the primitive restart logic. But that turns out to need a fix in display list handling anyhow. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
*	mesa: Factor out _mesa_array_element.	Mathias Fröhlich	2019-03-15	2	-19/+32
\| \| \| \| \| \| \| \| \|	The factored out function handles emitting the vertex attributes at the given index. The now public accessible function gets used in the following patches. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
*	mesa: Implement helper functions to map and unmap a VAO.	Mathias Fröhlich	2019-03-15	2	-0/+102
\| \| \| \| \| \| \| \| \| \|	Provide a set of functions that maps or unmaps all VBOs held in a VAO. The functions will be used in the following patches. v2: Update comments. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
*	st/mesa: Let NIR lower UBO and SSBO access when we have it	Jason Ekstrand	2019-03-15	2	-1/+11
\| \| \| \|	Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
*	i965: Stop setting LowerBuferInterfaceBlocks	Jason Ekstrand	2019-03-15	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead, we do UBO and SSBO deref lowering in NIR after we've given it a chance to optimize SSBO access: Shader-db results on Kaby Lake: total instructions in shared programs: 15235775 -> 15235484 (<.01%) instructions in affected programs: 14992 -> 14701 (-1.94%) helped: 19 HURT: 20 total cycles in shared programs: 339220331 -> 339027307 (-0.06%) cycles in affected programs: 79831981 -> 79638957 (-0.24%) helped: 540 HURT: 602 total loops in shared programs: 4402 -> 4348 (-1.23%) loops in affected programs: 186 -> 132 (-29.03%) helped: 27 HURT: 0 total spills in shared programs: 23261 -> 23234 (-0.12%) spills in affected programs: 38 -> 11 (-71.05%) helped: 1 HURT: 0 total fills in shared programs: 31442 -> 31371 (-0.23%) fills in affected programs: 98 -> 27 (-72.45%) helped: 1 HURT: 0 LOST: 12 GAINED: 12 Most of the help and hurt in instruction counts was just churn caused by re-ordering of optimizations and the fact that the NIR deref lowering code is emitting slightly different instructions. Nothing was hurt by more than three instructions and most things weren't helped by more than four. The primary exception to this is one Car Chase shader: shaders/non-free/gfxbench4/carchase/341.shader_test CS SIMD32: 1144 -> 821 (-28.23%) There is also one compute shader in Manhattan 3.1 and a fragment shader in the UE4 Shooter Game demo that now get a loop partially unrolled. Those showed up in the results as hurt instructions but were manually removed to get the results above. The lost/gained was a dozen Car Chase shaders that went from SIMD8 to SIMD16 thanks to improved register pressure: shaders/non-free/gfxbench4/carchase/366.shader_test CS shaders/non-free/gfxbench4/carchase/368.shader_test CS shaders/non-free/gfxbench4/carchase/370.shader_test CS shaders/non-free/gfxbench4/carchase/372.shader_test CS shaders/non-free/gfxbench4/carchase/376.shader_test CS shaders/non-free/gfxbench4/carchase/378.shader_test CS shaders/non-free/gfxbench4/carchase/380.shader_test CS shaders/non-free/gfxbench4/carchase/382.shader_test CS shaders/non-free/gfxbench4/carchase/384.shader_test CS shaders/non-free/gfxbench4/carchase/388.shader_test CS shaders/non-free/gfxbench4/carchase/4.shader_test CS shaders/non-free/gfxbench4/carchase/6.shader_test CS Given how much it appeared to be improved, I ran Car Chase on my laptop. Unfortunately, I wasn't able to see any measurable improvement. It might be helped by 1-2% but it's in the noise. It does render correctly as far as I can tell so the improvement is legitimate. All of the loops that got delete were in dolphin uber shaders. I've had no opportunity to test them for correctness or performance. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
*	mesa/st: Fix leaks of TGSI tokens in VP variants.	Eric Anholt	2019-03-14	1	-14/+20
\| \| \| \| \| \| \| \| \| \|	Starting a glxgears and closing it, I was seeing a lot of leaked TGSI for the fixed function VPs. v2: drop unused delete_ir() arg. Fixes: 3b4929ec6e64 ("st/mesa: Copy VP TGSI tokens if they exist, even for NIR shaders.") Reviewed-by: Kenneth Graunke <[email protected]>
*	mesa/st: Make sure that prog_to_nir NIR gets freed.	Eric Anholt	2019-03-14	1	-0/+6
\| \| \| \| \| \| \| \| \| \|	GLSL NIR gets freed on relink by _mesa_delete_program(), but for ARB programs we need to free the old NIR when PSN is used to set up new NIR in the same gl_program. Additionally, set the base .nir field so that it will get freed by _mesa_delete_program(). Fixes: 3d7611e9a6c6 ("st/nir: use NIR for asm programs") Reviewed-by: Kenneth Graunke <[email protected]>
*	mesa: add logging function for formatted string	Mark Janes	2019-03-14	2	-0/+35
\| \| \| \|	Reviewed-by: Erik Faye-Lund <[email protected]>
*	mesa: rename logging functions to reflect that they format strings	Mark Janes	2019-03-14	12	-92/+92
\| \| \| \| \| \| \|	In preparation for the definition of a function to log a formatted string. Reviewed-by: Erik Faye-Lund <[email protected]>
*	mesa: properly report the length of truncated log messages	Mark Janes	2019-03-14	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	_mesa_log_msg must provide the length of the string passed into the KHR_debug api. When the string formatted by _mesa_gl_vdebugf exceeds MAX_DEBUG_MESSAGE_LENGTH, the length is incorrectly set to the number of characters that would have been written if enough space had been available. Fixes: 30256805784450b8bb9d4dabfb56226271ca9d24 ("mesa: Add support for GL_ARB_debug_output with dynamic ID allocation.") Reviewed-by: Erik Faye-Lund <[email protected]>
*	i965: Disable ARB_fragment_shader_interlock for platforms prior to GEN9	Plamena Manolova	2019-03-14	1	-1/+24
\| \| \| \| \| \| \| \| \| \| \|	ARB_fragment_shader_interlock depends on memory fences to ensure fragment ordering and this ordering guarantee is only supported from GEN9 onwards. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109980 Fixes: 939312702e35 "i965: Add ARB_fragment_shader_interlock support." Signed-off-by: Plamena Manolova <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: remove scaling factors from P010, P012	Tapani Pälli	2019-03-14	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Patch removes scaling factors introduced in 2a2e69f975b but leaves option to use scaling in place as it could be useful with other upcoming YUV formats. We did this scaling because ffmpeg was shifting channel bits down, however it seems this is not the right place as compositor wants to flip same buffers directly to display as well and therefore bitshifting needs to be done by the client when receiving frame from ffmpeg. Now P0x formats are treated the same, e.g. P010 is same as P016 but with lower 6 bits set to zeros. Fixes: 2a2e69f975b "i965: add P0x formats and propagate required scaling factors" Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
*	st/glsl_to_nir: fix incorrect arrary access	Timothy Arceri	2019-03-12	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes a segfault when we try to access the array using a -1 when the array wasn't allocated in the first place. Before 7536af670b75 we would just access a pre-allocated array that was also load/stored to/from the shader cache. But now the cache will no longer allocate these arrays if they are empty. The change resulted in tests such as the following segfaulting when run with a warm shader cache. tests/spec/arb_arrays_of_arrays/execution/sampler/fs-struct-const-index.shader_test
*	i965: Reimplement all the PIPE_CONTROL rules.	Kenneth Graunke	2019-03-11	1	-136/+403
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This implements virtually all documented PIPE_CONTROL restrictions in a centralized helper. You now simply ask for the operations you want, and the pipe control "brain" will figure out exactly what pipe controls to emit to make that happen without tanking your system. The hope is that this will fix some intermittent flushing issues as well as GPU hangs. However, it also has a high risk of causing GPU hangs and other regressions, as this is a particularly sensitive area and poking the bear isn't always advisable. Mark Janes noted that this patch helps with some GPU hangs on Icelake. This does re-enable the VF Invalidate => Write Immediate workaround on Gen8, which had been disabled (bug 103787) due to GPU hangs. The old code did this workaround after another which would have added CS stall bits, so it missed a workaround. The new code orders them properly and appears to work. v4: Don't pass "bo, offset, imm" to a recursive CS stall (caught by Topi Pohjolainen), drop Gen10 workarounds that are unnecessary for production hardware. Reviewed-by: Topi Pohjolainen <[email protected]>
*	i965: Use genxml for emitting PIPE_CONTROL.	Kenneth Graunke	2019-03-11	7	-230/+362
\| \| \| \| \| \| \| \| \| \| \|	While this does add a bunch of boilerplate, it also protects us against the hardware moving bits, or changing their meaning. For something as finnicky as PIPE_CONTROL, the extra safety seems worth it. We turn PIPE_CONTROL_* into an bitfield of arbitrary flags, and then pack them appropriately. Reviewed-by: Topi Pohjolainen <[email protected]>
*	i965: Rename ISP_DIS to INDIRECT_STATE_POINTERS_DISABLE.	Kenneth Graunke	2019-03-11	2	-2/+2
\| \| \| \| \| \|	Clearer name. Reviewed-by: Topi Pohjolainen <[email protected]>
*	i965: Move some genX infrastructure to genX_boilerplate.h.	Kenneth Graunke	2019-03-11	4	-128/+174
\| \| \| \| \| \| \|	This will let us make multiple genX_*.c files, without copy and pasting all this boilerplate. Reviewed-by: Topi Pohjolainen <[email protected]>
*	st/mesa: minor refactoring of texture/sampler delete code	Brian Paul	2019-03-11	3	-6/+11
\| \| \| \| \| \| \| \| \| \| \| \|	Rename st_texture_free_sampler_views() to st_delete_texture_sampler_views() to align with st_DeleteTextureObject(), its only caller. Move the call to st_texture_release_all_sampler_views() from st_DeleteTextureObject() to st_delete_texture_sampler_views() so all the sampler view clean-up code is in one place. Reviewed-by: Neha Bhende <[email protected]>
*	st/mesa: rename st_texture_release_sampler_view()	Brian Paul	2019-03-11	3	-5/+5
\| \| \| \| \| \| \|	To st_texture_release_context_sampler_view() to be more clear that it's context-specific. Reviewed-by: Neha Bhende <[email protected]>
*	st/mesa: add/improve sampler view comments	Brian Paul	2019-03-11	1	-2/+8
\| \| \| \|	Reviewed-by: Neha Bhende <[email protected]>
*	st/mesa: move around some code in st_context.c	Brian Paul	2019-03-11	2	-122/+116
\| \| \| \| \| \| \| \| \| \| \| \|	st_init_driver_functions() is only called in st_context.c so there's no need for the prototype in st_context.h To avoid a forward declaration of st_init_driver_functions() in st_context.c, we need to move around several other functions. No functional change. Reviewed-by: Neha Bhende <[email protected]>
*	st/mesa: move utility functions, macros into new st_util.h file	Brian Paul	2019-03-11	33	-91/+184
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	To de-clutter st_context.h. Clean up remaining function prototypes in st_context.h. The st_vp_uses_current_values() helper is only used in st_context.c so move it there. The st_get_active_states() function is only used in st_context.c so remove its prototype in st_context.h Reviewed-by: Neha Bhende <[email protected]>
*	prog_to_nir: fix write from vps to FOG	Karol Herbst	2019-03-08	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \|	for fragment programs we already treat fog as a single component value, but for vp we didn't. Fixes fog related piglit tests with my out of tree Nouveau nir patches. Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	st/mesa: init hash keys with memset(), not designated initializers	Brian Paul	2019-03-08	2	-5/+17
\| \| \| \| \| \| \| \| \| \|	Since the compiler may not zero-out padding in the object. Add a couple comments about this to prevent misunderstandings in the future. Fixes: 67d96816ff5 ("st/mesa: move, clean-up shader variant key decls/inits") Reviewed-by: Roland Scheidegger <[email protected]>
*	st/mesa: whitespace, formatting fixes in st_cb_flush.c	Brian Paul	2019-03-08	1	-14/+19
\| \| \| \|	Trivial.
*	st/mesa: move, clean-up shader variant key decls/inits	Brian Paul	2019-03-08	2	-10/+7
\| \| \| \| \| \| \|	Move the variant key declarations inside the scope they're used. Use designated initializers instead of memset() calls. Reviewed-by: Neha Bhende <[email protected]>
*	isl: Add a swizzle parameter to isl_buffer_fill_state()	Kenneth Graunke	2019-03-07	1	-0/+1
\| \| \| \| \| \| \|	This is necessary for legacy texture buffer object formats, where we'll need to use a swizzle to fake e.g. luminance. Reviewed-by: Jason Ekstrand <[email protected]>
*	intel/decoders: handle decoding MI_BBS from ring	Lionel Landwerlin	2019-03-07	1	-1/+1
\| \| \| \| \| \| \| \| \|	An MI_BATCH_BUFFER_START in the ring buffer acts as a second level batchbuffer (aka jump back to ring buffer when running into a MI_BATCH_BUFFER_END). Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
*	intel/decoders: add address space indicator to get BOs	Lionel Landwerlin	2019-03-07	1	-1/+1
\| \| \| \| \| \| \|	Some commands like MI_BATCH_BUFFER_START have this indicator. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
*	st/glsl: start spilling out common st glsl conversion code	Timothy Arceri	2019-03-06	7	-122/+222
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The NIR and TGSI paths are currently intertwined which makes it not only hard to follow but also makes it hard to take advantage of the differences in IR. Here we take the first step to splitting that path apart. With this we take the opportunity to no longer call the GLSL IR optimisation passes after the final lowering calls for NIR. We can instead just use the NIR passes which can produce better code and should also result in faster compile times. The speed-up can be measured in some dolphin uber shaders due to no longer calling lower_if_to_cond_assign() for example dolphin/ubershaders/120.shader_test goes from ~1.63 -> ~1.53 seconds on my machine. There are some code changes as a result of not calling lower_if_to_cond_assign(), this is because it flattens ifs that contain UBOs where as NIR's peephole select doesn't. This is were most of the regressions in Max Waves happens with shader-db. shader-db results (VEGA): Totals from affected shaders: SGPRS: 2349056 -> 2349640 (0.02 %) VGPRS: 1322160 -> 1323300 (0.09 %) Spilled SGPRs: 21190 -> 21527 (1.59 %) Spilled VGPRs: 99 -> 99 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 72 -> 72 (0.00 %) dwords per thread Code Size: 57260904 -> 57270932 (0.02 %) bytes Compile Time: 1107186 -> 1022942 (-7.61 %) milliseconds LDS: 786 -> 786 (0.00 %) blocks Max Waves: 391932 -> 391619 (-0.08 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Eric Anholt <[email protected]>
*	i965: stop calling nir_lower_returns()	Timothy Arceri	2019-03-06	1	-3/+1
\| \| \| \| \| \|	We now call this for all drivers in glsl_to_nir() instead. Reviewed-by: Eric Anholt <[email protected]>
*	glsl: use NIR function inlining for drivers that use glsl_to_nir()	Timothy Arceri	2019-03-06	2	-2/+2
\| \| \| \| \| \| \| \|	glsl_to_nir() is still missing support for converting certain functions to NIR, so for those we use the GLSL IR optimisations to remove the functions. Reviewed-by: Eric Anholt <[email protected]>
*	st/nir: Move 64-bit lowering later	Jason Ekstrand	2019-03-06	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that we have a loop unrolling cost function and loop unrolling isn't going to kill us the moment we have a 64-bit op in a loop, we can go ahead and move 64-bit lowering later. This gives us the opportunity to do more optimizations and actually let the full optimizer run even on 64-bit ops rather than hoping one round of opt_algebraic will fix everything. This substantially reduces both fp64 shader compile times and the resulting code size. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	nir/lower_doubles: Inline functions directly in lower_doubles	Jason Ekstrand	2019-03-06	2	-23/+8
\| \| \| \| \| \| \| \| \| \| \| \|	Instead of trusting the caller to already have created a softfp64 function shader and added all its functions to our shader, we simply take the softfp64 shader as an argument and do the function inlining ouselves. This means that there's no more nasty functions lying around that the caller needs to worry about cleaning up. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	glsl/nir: Add a shared helper for building float64 shaders	Jason Ekstrand	2019-03-06	3	-99/+5
\| \| \| \| \| \|	Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Compile the fp64 program based on nir options	Jason Ekstrand	2019-03-06	1	-1/+2
\| \| \| \| \| \| \| \| \| \|	Instead of looking the devinfo directly, look at the lowering options we provided to NIR. This is more accurate as it's now checking for "do we need full software lowering" rather than a hardware bit. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>