mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	nir: pass compiler rather than devinfo to functions that call nir_optimize	Timothy Arceri	2016-12-23	7	-21/+18
\| \| \| \| \| \| \|	Later we will pass compiler to nir_optimise to be used by the loop unroll pass. Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: use nir_lower_indirect_derefs() for GLSL	Timothy Arceri	2016-12-23	2	-17/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This moves the nir_lower_indirect_derefs() call into brw_preprocess_nir() so thats is called by both OpenGL and Vulkan and removes that call to the old GLSL IR pass lower_variable_index_to_cond_assign() We want to do this pass in nir to be able to move loop unrolling to nir. There is a increase of 1-3 instructions in a small number of shaders, and 2 Kerbal Space program shaders that increase by 32 instructions. The changes seem to be caused be the difference in the GLSL IR vs NIR variable index lowering passes. The GLSL IR pass creates a simple if ladder for arrays of size 4 or less, while the NIR pass implements a binary search for all arrays regardless of size. Shader-db results BDW: total instructions in shared programs: 13021176 -> 13021819 (0.00%) instructions in affected programs: 57693 -> 58336 (1.11%) helped: 20 HURT: 190 total cycles in shared programs: 299805580 -> 299750826 (-0.02%) cycles in affected programs: 2290024 -> 2235270 (-2.39%) helped: 337 HURT: 442 total fills in shared programs: 19984 -> 19984 (0.00%) fills in affected programs: 0 -> 0 helped: 0 HURT: 0 LOST: 4 GAINED: 0 V2: remove the do_copy_propagation() call from the i965 GLSL IR linking code. This call was added in f7741c52111 but since we are moving the variable index lowering to NIR we no longer need it and can just rely on the nir copy propagation pass. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: allow sampler indirects on all gens	Timothy Arceri	2016-12-23	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	Without this we will regress the max-samplers piglit test on Gen6 and lower when loop unrolling is done in NIR. There is a check in the GLSL IR linker that errors when it finds indirects and EmitNoIndirectSampler is set. As far as I can tell there is no reason for not enabling this for all gens regardless of whether they fully support ARB_gpu_shader5 or not. Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: allow unsourced enabled VAO	Juan A. Suarez Romero	2016-12-21	1	-7/+16
\| \| \| \| \| \| \| \| \| \| \| \|	The GL 4.5 spec says: "If any enabled array’s buffer binding is zero when DrawArrays or one of the other drawing commands defined in section 10.4 is called, the result is undefined." This commits avoids crashing the code, which is not a very good "undefined result". This fixes spec/!opengl 3.1/vao-broken-attrib piglit test.
*	st/nine: Implement gallium nine CSMT	Patrick Rudolph	2016-12-20	1	-0/+5
\| \| \| \| \| \| \| \|	Use an offloading thread for all nine_context functions. Macros are used to ease the reading of the code. Signed-off-by: Patrick Rudolph <[email protected]> Signed-off-by: Axel Davy <[email protected]>
*	driconf: Fix missing gettext	Axel Davy	2016-12-20	1	-1/+1
\| \| \| \| \| \| \|	DRI_CONF_NINE_OVERRIDEVENDOR was missing gettext for the description. Signed-off-by: Axel Davy <[email protected]>
*	st/nine: Add new driconf options to control DISCARD behaviour	Axel Davy	2016-12-20	1	-0/+10
\| \| \| \| \| \|	See the patch for the new controls added. Signed-off-by: Axel Davy <[email protected]>
*	i965: Don't bail on vertex element processing if we need draw params.	Kenneth Graunke	2016-12-20	1	-17/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	BaseVertex, BaseInstance, DrawID, and some edge flag conditions need vertex buffer and elements structs. We can't bail early in this case. Gen4-7 already do this properly. Gen8+ did not. Thanks to Ilia Mirkin for helping track this down. Cc: [email protected] Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99144 Reported-by: Pierre-Eric Pelloux-Prayer <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: keep gl_program shader info in sync after gather info	Timothy Arceri	2016-12-20	2	-2/+11
\| \| \| \| \| \| \| \| \| \|	It's possible that nir_shader was cloned and it no longer contains a pointer to the shader_info in gl_program. So we need to copy shader_info back to gl_program if that is the case. Fixes a regression with NIR_TEST_CLONE=true Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98840
*	i965/vec4: Fix TCS output reads with non-zero component qualifiers.	Kenneth Graunke	2016-12-14	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We want to perform the URB read to a vec4 temporary, with no writemask, then issue a MOV to swizzle the data and store it to the actual destination, using the final writemask. We were doing this wrong. For example, let's say we wanted to read a vec2 stored in components 2-3 of a vec4. We would generate a URB read message of: SEND <actual destination>.XY <header with mask set to XY> MOV <actual destination>.XY <actual destination>.ZW This doesn't work, because the URB message reads the .XY components of the vec4, rather than the ZW. It writes to the right place, but with the wrong data. Then the MOV comes along and overwrites it with data that didn't even come from the URB at all. Instead we want to do: SEND <temporary> <header with mask set to ZW> MOV <actual destination>.XY <temporary>.ZW Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/disasm: Decode dataport constant cache control fields.	Francisco Jerez	2016-12-14	1	-0/+1
\| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Remove the FS_OPCODE_SET_SIMD4X2_OFFSET virtual opcode.	Francisco Jerez	2016-12-14	4	-33/+0
\| \| \| \| \| \|	Not used anymore. It was just a scalar MOV. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Drop useless access mode override from pull constant generator code.	Francisco Jerez	2016-12-14	1	-2/+0
\| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Fetch one cacheline of pull constants at a time.	Francisco Jerez	2016-12-14	2	-19/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Asking the DC for less than one cacheline (4 owords) of data for uniform pull constants is suboptimal because the DC cannot request less than that from L3, resulting in wasted bandwidth and unnecessary message dispatch overhead, and exacerbating the IVB L3 serialization bug. The following table summarizes the overall framerate improvement (with statistical significance of 5% and sample size ~10) from the whole series up to this patch for several benchmarks and hardware generations: \| SKL \| BDW \| HSW SynMark2 OglShMapPcf \| 24.63% ±0.45% \| 4.01% ±0.70% \| 10.31% ±0.38% GfxBench4 gl_manhattan31 \| 5.93% ±0.35% \| 3.92% ±0.31% \| 6.62% ±0.22% GfxBench4 gl_4 \| 2.52% ±0.44% \| 1.23% ±0.10% \| N/A Unigine Valley \| 0.83% ±0.17% \| 0.23% ±0.05% \| 0.74% ±0.45% Note that there are two versions of the Manhattan demo shipped with GfxBench4, one of them is the original gl_manhattan demo which doesn't use UBOs, so this patch will have no effect on it, and another one is the gl_manhattan31 demo based on GL 4.3/GLES 3.1, which this patch benefits as shown above. I haven't observed any statistically significant regressions in the benchmarks I have at hand. Note that the comparatively huge improvement on SKL in the OglShMapPcf test case is due to the combined effect of this patch and the register pressure benefit on SKL+ of "i965/fs: Switch to the constant cache for uniform pull constants.", part of the same series. Going up to 8 oword blocks would improve performance of pull constants even more, but at the cost of some additional bandwidth and register pressure, so it would have to be done on-demand based on the number of constants actually used by the shader. v2: Fix for Gen4 and 5. v3: Non-trivial rebase. Rework to allow the visitor specifiy arbitrary pull constant block sizes. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Expose arbitrary pull constant load sizes to the IR.	Francisco Jerez	2016-12-14	4	-27/+26
\| \| \| \| \| \| \| \| \| \|	Change the FS generator to ask the dataport for enough owords worth of constants to fill the execution size of the instruction -- Which means that the visitor now needs to set the execution size correctly for uniform pull constant load instructions, which we were kind of neglecting until now. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Factor out oword block read and write message control calculation.	Francisco Jerez	2016-12-14	2	-12/+8
\| \| \| \| \| \| \| \| \|	We'll need roughly the same logic in other places and it would be annoying to duplicate it. Instead factor it out into a function-like macro that takes the number of dwords per block (which will prove more convenient than taking the same value in owords or some other unit). Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Switch to the constant cache for uniform pull constants.	Francisco Jerez	2016-12-14	4	-91/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts to using the oword block read messages for uniform pull constant loads, as used to be the case until 4c1fdae0a01b3f92ec03b61aac1d3df5. There are two important differences though: Now the L3 cacheability bits are set up correctly for UBOs (since 11f5d8a5d4fbb861ec161f68593e429cbd65d1cd), and we target the constant cache instead of the data cache. The latter used to get no L3 way allocation on boot on all platforms that existed at the time, so oword read messages wouldn't get cached on L3 regardless of the MOCS bits, what probably explains the apparent slowness of oword fetches. Constant cache loads seem to perform better than SIMD4x2 sampler loads in a number of cases, they alleviate some of the cache thrashing caused by the competition with textures for the L1/L2 sampler caches, and they allow fetching up to 128B worth of constants with a single oword fetch message. Note that IVB devices suffer from a hardware bug that leads to serialization of L3 read requests overlapping the same cacheline as result of a (on IVB buggy) mechanism of the L3 to preserve coherency. Since read requests for matching cachelines from any L3 client are not pipelined, throughput may decrease in cases where there are no non-overlapping requests left in the queue that can be processed between them. This situation should be relatively uncommon as long as we make sure that we don't use the 1/2 oword messages in cases where the shader intends to read from any other location of the same cacheline at some other point. This is generally a good idea anyway on all generations because using the 1 and 2 oword messages is expected to waste bandwidth since the minimum L3 request size for the DC is exactly 4 owords (i.e. one cacheline). A future commit will have this effect. I haven't been able to find any real-world example where this would still result in a regression on IVB, but if someone happens to find one it shouldn't be too difficult to add an IVB-specific check to have it fall back to the sampler cache for pull constant loads. Note that on SKL+ this change has the additional benefit of reducing the register footprint of pull constant loads. The following table summarizes the effect of the whole series on several shader-db stats: Total instructions Total cycles BWR: 4571248 -> 4568342 (-0.06%) 123375740 -> 123373296 (-0.00%) ELK: 3989020 -> 3985402 (-0.09%) 98757068 -> 98754058 (-0.00%) ILK: 6383591 -> 6376787 (-0.11%) 143649910 -> 143648914 (-0.00%) SNB: 7528395 -> 7501446 (-0.36%) 103503796 -> 102460370 (-1.01%) IVB: 6949221 -> 6943317 (-0.08%) 60592262 -> 60584422 (-0.01%) HSW: 6409753 -> 6403702 (-0.09%) 60609070 -> 60604414 (-0.01%) BDW: 8043467 -> 7976364 (-0.83%) 68427730 -> 68483042 (0.08%) CHV: 8045019 -> 7977916 (-0.83%) 68297426 -> 68352756 (0.08%) SKL: 8204037 -> 7939086 (-3.23%) 66583900 -> 65624378 (-1.44%) Lost->Gained Total spills Total fills BWR: 5 -> 5 1488 -> 1488 (0.00%) 1957 -> 1957 (0.00%) ELK: 5 -> 5 1489 -> 1489 (0.00%) 1958 -> 1958 (0.00%) ILK: 1 -> 4 1449 -> 1449 (0.00%) 1921 -> 1921 (0.00%) SNB: 0 -> 0 549 -> 549 (0.00%) 52 -> 52 (0.00%) IVB: 13 -> 3 1271 -> 1271 (0.00%) 1162 -> 1162 (0.00%) HSW: 11 -> 0 1271 -> 1271 (0.00%) 1162 -> 1162 (0.00%) BDW: 12 -> 0 1340 -> 1340 (0.00%) 1452 -> 1452 (0.00%) CHV: 12 -> 0 1340 -> 1340 (0.00%) 1452 -> 1452 (0.00%) SKL: 0 -> 120 1269 -> 375 (-70.45%) 1563 -> 690 (-55.85%) v3: Non-trivial rebase. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Let the caller of brw_set_dp_write/read_message control the target cache.	Francisco Jerez	2016-12-14	3	-42/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	brw_set_dp_read_message already had a target_cache argument, but its interpretation was rather convoluted (on Gen6 the render cache was used if the caller asked for it, otherwise it was ignored using the sampler cache instead), and the constant cache wasn't representable at all. brw_set_dp_write_message used the data cache on Gen7+ except for RENDER_TARGET_WRITE messages, in which case it would use the render cache. On Gen6 the render cache was always used. Instead of the above, provide the shared unit SFID that the caller expects will be used. Makes no functional changes. v3: Non-trivial rebase. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/gen6+: Invalidate constant cache on brw_emit_mi_flush().	Francisco Jerez	2016-12-14	1	-0/+1
\| \| \| \| \| \| \|	In order to make sure that the constant cache is coherent with previous rendering when we start using it for pull constant loads. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/miptree: Use intel_miptree_copy for maps	Jason Ekstrand	2016-12-13	1	-12/+8
\| \| \| \| \| \| \| \| \| \|	What we're really doing is copying a texture not blitting it in the sense of glBlitFramebuffers. Also, the intel_miptree_copy function is capable of properly handling compressed textures which intel_miptree_blit is not. Reviewed-by: Topi Pohjolainen <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97473 Cc: "13.0" <[email protected]>
*	i965/blit: Fix the src dimension sanity check in miptree_copy	Jason Ekstrand	2016-12-13	1	-2/+10
\| \| \| \| \|	Reviewed-by: Topi Pohjolainen <[email protected]> Cc: "13.0" <[email protected]>
*	main: use new driver flag for conservative rasterization state	Lionel Landwerlin	2016-12-13	4	-6/+11
\| \| \| \| \| \| \| \| \| \| \|	Suggested by Marek. v2: Use new driver flag (Marek) v3: Fix i965 comments (Lionel) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	i965: remove brw_lower_texture_gradients	Iago Toral Quiroga	2016-12-13	5	-358/+1
\| \| \| \| \| \| \|	This has been ported to NIR now so we don'tneed to keep the GLSL IR lowering any more. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/nir: enable lowering of texture gradient for shadow samplers	Iago Toral Quiroga	2016-12-13	1	-0/+3
\| \| \| \| \| \| \|	This gets the lowering on the Vulkan driver too, which is required for hardware that does not have the sample_l_d message (up to IvyBridge). Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/nir: enable lowering of texture gradient for cube maps	Iago Toral Quiroga	2016-12-13	1	-0/+1
\| \| \| \| \| \| \| \| \|	This gets the lowering on the Vulkan driver too. Fixes Vulkan CTS cube map texture gradient tests in: dEQP-VK.glsl.texture_functions.texturegrad.* Reviewed-by: Kenneth Graunke <[email protected]>
*	treewide: s/comparitor/comparator/	Ilia Mirkin	2016-12-12	7	-24/+24
\| \| \| \| \| \| \| \| \| \|	git grep -l comparitor \| xargs sed -i 's/comparitor/comparator/g' Just happened to notice this in a patch that was sent and included one of the tokens in question. Signed-off-by: Ilia Mirkin <[email protected]> Acked-by: Nicolai Hähnle <[email protected]>
*	i965/fs: Reject copy propagation into SEL if not min/max.	Matt Turner	2016-12-12	2	-1/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We shouldn't ever see a SEL with conditional mod other than GE (for max) or L (for min), but we might see one with predication and no conditional mod. total instructions in shared programs: 8241806 -> 8241902 (0.00%) instructions in affected programs: 13284 -> 13380 (0.72%) HURT: 62 total cycles in shared programs: 84165104 -> 84166244 (0.00%) cycles in affected programs: 75364 -> 76504 (1.51%) helped: 10 HURT: 34 Fixes generated code in at least Sanctum 2, Borderlands 2, Goat Simulator, XCOM: Enemy Unknown, and Shogun 2. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92234 Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/fs: Add unit tests for copy propagation pass.	Matt Turner	2016-12-12	2	-0/+211
\| \| \| \| \| \|	Pretty basic, but it's a start. Acked-by: Jason Ekstrand <[email protected]>
*	i965/fs: Rename opt_copy_propagate -> opt_copy_propagation.	Matt Turner	2016-12-12	3	-15/+16
\| \| \| \| \| \|	Matches the vec4 backend, cmod propagation, and saturate propagation. Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/blorp: fix release build unused variable warning	Grazvydas Ignotas	2016-12-12	1	-3/+1
\| \| \| \| \|	Signed-off-by: Grazvydas Ignotas <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]>
*	i965: Print out cycle estimates at the start of block annotations.	Kenneth Graunke	2016-12-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	We now print START B15 <-B14 (42774 cycles) indicating that we estimate B15 will take 42,774 cycles. Printing this should make it easier where time is spent in the program. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965/mt: Disable HiZ when sharing depth buffer externally (v2)	Chad Versace	2016-12-10	1	-7/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	intel_miptree_make_shareable() discarded and disabled CCS. Fix it so that it discards and disables HiZ too. Fixes dEQP-EGL.functional.image.render_multiple_contexts.gles2_renderbuffer_depth16_depth_buffer on Skylake. v2: Actually do what the commit message says. Discard the HiZ buffer. Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=98329 Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: Nanley Chery <[email protected] Cc: Haixia Shi <[email protected]> Cc: [email protected]
*	i965/mt: Disable aux surfaces after making miptree shareable	Chad Versace	2016-12-10	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The entire goal of intel_miptree_make_shareable() is to permanently disable the miptree's aux surfaces. So set intel_mipmap_tree:disable_aux_buffers after the function's done with discarding down the aux surfaces. References: https://bugs.freedesktop.org/show_bug.cgi?id=98329 Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: Nanley Chery <[email protected] Cc: Haixia Shi <[email protected]> Cc: [email protected]
*	i965: delay adding built-in uniforms to Parameters list	Timothy Arceri	2016-12-09	1	-23/+19
\| \| \| \| \| \| \| \| \| \|	This is a step towards using NIR optimisations over GLSL IR optimisations. Delaying adding built-in uniforms until after we convert to NIR gives it a chance to optimise them away. V2: move the new code back to brw_link_shader() Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Increase max texture to 16k for gen7+	Jordan Justen	2016-12-07	1	-3/+10
\| \| \| \| \| \|	Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98297 Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: enable INTEL_conservative_rasterization on Gen9+	Lionel Landwerlin	2016-12-07	6	-5/+18
\| \| \| \| \|	Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
*	i965: Add i965 plumbing for ARB_post_depth_coverage for i965 (gen9+).	Plamena Manolova	2016-12-07	4	-3/+13
\| \| \| \| \| \| \| \| \| \|	This extension allows the fragment shader to control whether values in gl_SampleMaskIn[] reflect the coverage after application of the early depth and stencil tests. Signed-off-by: Plamena Manolova <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: Drop redundant key->outputs_written initialization.	Kenneth Graunke	2016-12-06	1	-2/+0
\| \| \| \| \| \| \|	This was already set to the same value earlier. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965: Initialize "separate" flag in VUE maps.	Kenneth Graunke	2016-12-06	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \|	This was uninitialized, which resulted in weird looking printouts where it appeared that the TCS output and TES input patch URB entries differed in SSO/non-SSO layout. There is no "separable" layout for both, as they're tied together. It should have no other actual effect. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965: Don't force SSO layout for VS->TCS.	Kenneth Graunke	2016-12-06	2	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This was a hack which worked around the VS and TCS disagreeing on their shared interface due to the lack of varying packing. In particular, it was needed by Piglit's tcs-input-read-array-interface test. However, that was just one case where things could go awry, so the previous commit forcibly made interfaces match. This hack is no longer necessary. It also seems to be broken, though I'm not sure why. It fixes Piglit regressions in spec/arb_shader_image_load_store/semantics from commit ec1f159ac81ed964415d102eed4a0a29be8e7937. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98893 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
*	i965: Unify shader interfaces explicitly.	Kenneth Graunke	2016-12-06	1	-0/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A while ago, I made i965 start compiling shaders independently. The VUE map layouts were based entirely on each shader's input/output bitfields. Assuming the interfaces match, this works out well - both sides will compute the same layout, and outputs are correctly routed to inputs. At the time, I had assumed that the linker would guarantee that the interfaces match. While it usually succeeds, it unfortunately seems to fail in some cases. For example, Piglit's tcs-input-read-array-interface test has a VS output array with two elements, but the TCS only reads one. The linker isn't able to eliminate the unused element from the VS, which makes the interfaces not match. Another case is where a shader other than the last writes clip/cull distances. These should be demoted to ordinary varyings, but they currently aren't - so we think they still have some special meaning, and prevent them from being eliminated. Fixing the linker to guarantee this in all cases is complicated. It needs to be able to optimize out dead code. It's tied into varying packing and other messiness. While we can certainly improve it---and should---I'd rather not rely on it being correct in all cases. This patch ORs adjacent stages' input/output bitfields together, ensuring that their interface (and hence VUE map layout) will be compatible. This should safeguard us against linker insufficiencies. Fixes line rendering in Dolphin, and the Piglit test based on it: spec/glsl-1.50/execution/geometry/clip-distance-vs-gs-out. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97232 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
*	i965: Emit proper NOPs.	Matt Turner	2016-12-06	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \|	The PRMs for HSW and newer say that other than the opcode and DebugCtrl bits of the instruction word, the rest must be zero. By zeroing the instruction word manually, we avoid using any of the state inherited through brw_codegen. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96959 Reviewed-by: Ian Romanick <[email protected]>
*	i965: Allocate at least some URB space even when max_vertices = 0.	Kenneth Graunke	2016-12-05	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allocating zero URB space is a really bad idea. The hardware has to give threads a handle to their URB space, and threads have to use that to terminate the thread. Having it be an empty region just breaks a lot of assumptions. Hence, why we asserted that it isn't possible. Unfortunately, it /is/ possible prior to Gen8, if max_vertices = 0. In theory a geometry shader could do SSBO/image access and maybe still accomplish something. In reality, this is tripped up by conformance tests. Gen8+ already avoids this problem by placing the vertex count DWord in the URB entry header. This fixes things on earlier generations. Cc: [email protected] Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Tested-by: Ian Romanick <[email protected]>
*	Revert "i965: use nir_lower_indirect_derefs() for GLSL"	Jason Ekstrand	2016-12-05	2	-10/+13
\| \| \| \| \|	This reverts commit 9404439a754e5640ccd98df40fa694835c0d8759. I didn't intend to push it and it breaks clip and cull distance.
*	i965: Delete the meta-base CopyImageSubData implementation	Jason Ekstrand	2016-12-05	3	-327/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When I originally implemented the ARB_copy_image extension, the fast-path was written in meta using texture views. This path only worked if both images were uncompressed color images. All of the other cases fell back to the blitter or, in the worst case, mapping and memcpy on the CPU. Now that we have the blorp path, it handles all copies ever and the old meta, blitter, and CPU paths are only used on gen5 and below. The primary reason why we needed the meta path (apart from having a slow blitter on later hardware) was to handle multisampling which gen5 and earlier don't support anyway. Since the blitter is reasonably fast on gen5, we can just delete the meta path and get rid of all that terrible code. If we decide that we're ok with just disabling ARB_copy_image on gen5 and earlier (I personally am), then we could get rid of another 300 lines or so of semi-hairy code. Reviewed-by: Anuj Phogat <[email protected]>
*	i965/copy_image: Re-implement the blitter path with emit_miptree_blit	Jason Ekstrand	2016-12-05	3	-97/+80
\| \| \| \| \| \| \| \| \| \|	By using emit_miptree_blit which does chunking, this fixes the blitter path for the case where the image is too tall to blit normally. We also pull it into intel_blit as intel_miptree_copy. This matches the naming of the blorp blit and copy functions brw_blorp_blit and brw_blorp_copy. Reviewed-by: Anuj Phogat <[email protected]> Cc: "13.0" <[email protected]>
*	i965/blit: Break the guts of intel_miptree_blit into a helper	Jason Ekstrand	2016-12-05	1	-67/+84
\| \| \| \| \|	Reviewed-by: Anuj Phogat <[email protected]> Cc: "13.0" <[email protected]>
*	i965: use nir_lower_indirect_derefs() for GLSL	Timothy Arceri	2016-12-05	2	-13/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This moves the nir_lower_indirect_derefs() call into brw_preprocess_nir() so thats is called by both OpenGL and Vulkan and removes that call to the old GLSL IR pass lower_variable_index_to_cond_assign() We want to do this pass in nir to be able to move loop unrolling to nir. There is a increase of 1-3 instructions in a small number of shaders, and 2 Kerbal Space program shaders that increase by 32 instructions. Shader-db results BDW: total instructions in shared programs: 8705873 -> 8706194 (0.00%) instructions in affected programs: 32515 -> 32836 (0.99%) helped: 3 HURT: 79 total cycles in shared programs: 74618120 -> 74583476 (-0.05%) cycles in affected programs: 528104 -> 493460 (-6.56%) helped: 47 HURT: 37 LOST: 2 GAINED: 0
*	i965: Release aux buffer when disabling ccs	Topi Pohjolainen	2016-12-05	1	-0/+3
\| \| \| \| \| \| \| \|	Otherwise subsequent render cycles keep on using compression and/or fast clear. Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/sched: Schedule trivial blocks.	Matt Turner	2016-11-29	1	-3/+0
\| \| \| \| \| \| \| \| \| \|	In commit 45cd76e342d1e8e schedule_instructions(bblock_t *) began setting bblock_t::cycle_count, but that function was not called on trivial blocks. Remove the code to skip trivial blocks so that cycle_count is set. Reviewed-by: Francisco Jerez <[email protected]>