aboutsummaryrefslogtreecommitdiffstats
path: root/src/intel/compiler/brw_vec4.cpp
Commit message (Collapse)AuthorAgeFilesLines
* intel/compiler: Do not reswizzle dst if instruction writes to flag registerDanylo Piliaiev2019-04-161-0/+6
| | | | | | | | | | If we write to the flag register changing the swizzle would change what channels are written to the flag register. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110201 Fixes: 4cd1a0be Signed-off-by: Danylo Piliaiev <[email protected]> Reviewed-by: <[email protected]>
* intel/common: move gen_debug to intel/devMark Janes2019-04-101-1/+1
| | | | | | | | | libintel_common depends on libintel_compiler, but it contains debug functionality that is needed by libintel_compiler. Break the circular dependency by moving gen_debug files to libintel_dev. Suggested-by: Kenneth Graunke <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler: silence unitialized variable warning in opt_vector_float()Brian Paul2019-03-081-1/+1
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/compiler: Re-prefix non-logical surface opcodes with VEC4Jason Ekstrand2019-02-281-6/+6
| | | | | | | | The scalar back-end uses SHADER_OPCODE_SEND for all surface messages so we no longer need the non-logical opcodes there. Prefix them VEC4 so it's clear that they're only used by the vec4 back-end. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/compiler: Drop unused surface opcodesJason Ekstrand2019-02-281-6/+0
| | | | | | | | | The unused typed surface read/write support in the vec4 back-end has been dropped and the fs back-end now uses SHADER_OPCODE_SEND for all image and buffer ops. There's no reason to keep these opcodes around anymore. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/compiler: Prevent warnings in the following patchMatt Turner2019-01-091-5/+7
| | | | | | | | | | The next patch replaces an unsigned bitfield with a plain unsigned, which triggers gcc to begin warning on signed/unsigned comparisons. Keeping this patch separate from the actual move allows bisectablity and generates no additional warnings temporarily. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Eliminate unary op on operand of compare-with-zeroIan Romanick2018-12-171-12/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The (-abs(x) >= 0) => (x == 0) optimization is removed from the vec4 and scalar parts. In the VS part, adding the new pattern was not helpful. The pattern that is removed is really old, and it has been handled by NIR for ages. All Gen7+ platforms had similar results. (Broadwell shown) total instructions in shared programs: 14715715 -> 14715709 (<.01%) instructions in affected programs: 474 -> 468 (-1.27%) helped: 6 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 1.12% max: 1.35% x̄: 1.28% x̃: 1.35% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -1.40% -1.15% Instructions are helped. total cycles in shared programs: 559569911 -> 559569809 (<.01%) cycles in affected programs: 5963 -> 5861 (-1.71%) helped: 6 HURT: 0 helped stats (abs) min: 16 max: 18 x̄: 17.00 x̃: 17 helped stats (rel) min: 1.45% max: 1.88% x̄: 1.73% x̃: 1.85% 95% mean confidence interval for cycles value: -18.15 -15.85 95% mean confidence interval for cycles %-change: -1.95% -1.51% Cycles are helped. Iron Lake and Sandy Bridge had similar results. (Iron Lake shown) total instructions in shared programs: 7780915 -> 7780913 (<.01%) instructions in affected programs: 246 -> 244 (-0.81%) helped: 2 HURT: 0 total cycles in shared programs: 177876108 -> 177876106 (<.01%) cycles in affected programs: 3636 -> 3634 (-0.06%) helped: 1 HURT: 0 GM45 total instructions in shared programs: 4799152 -> 4799151 (<.01%) instructions in affected programs: 126 -> 125 (-0.79%) helped: 1 HURT: 0 total cycles in shared programs: 122052654 -> 122052652 (<.01%) cycles in affected programs: 3640 -> 3638 (-0.05%) helped: 1 HURT: 0 Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* i965: Do NIR shader cloning in the caller.Kenneth Graunke2018-11-201-2/+1
| | | | | | | | | | | | This moves nir_shader_clone() to the driver-specific compile function, rather than the shared src/intel/compiler code. This allows i965 to do key-specific passes before calling brw_compile_*. Vulkan should not need this cloning as it doesn't compile multiple variants. We do need to continue cloning in the compute shader code because we lower various things in NIR based on the SIMD width. Reviewed-by: Alejandro Piñeiro <[email protected]>
* intel: Don't propagate conditional modifiers if a UD source is negatedJason Ekstrand2018-10-101-0/+20
| | | | | | | | | This fixes a bug uncovered by my NIR integer division by constant optimization series. Fixes: 19f9cb72c8b "i965/fs: Add pass to propagate conditional..." Fixes: 627f94b72e0 "i965/vec4: adding vec4_cmod_propagation..." Reviewed-by: Ian Romanick <[email protected]>
* Replace uses of _mesa_bitcount with util_bitcountDylan Baker2018-09-071-1/+2
| | | | | | | | | | | | | and _mesa_bitcount_64 with util_bitcount_64. This fixes a build problem in nir for platforms that don't have popcount or popcountll, such as 32bit msvc. v2: - Fix additional uses of _mesa_bitcount added after this was originally written Acked-by: Eric Engestrom <[email protected]> (v1) Acked-by: Eric Anholt <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* intel/compiler: silence -Wclass-memaccess warningsCaio Marcelo de Oliveira Filho2018-07-181-2/+2
| | | | Reviewed-by: Anuj Phogat <[email protected]>
* intel/compiler: Relax mixed type restriction for saturating immediatesIan Romanick2018-07-061-2/+11
| | | | | | | | | | | | | | | | | At the time of commit 7bc6e455e23 (i965: Add support for saturating immediates.) we thought mixed type saturates would be impossible. We were only thinking about type converting moves from D to F, for example. However, type converting moves w/saturate from F to DF are definitely possible. This change minimally relaxes the restriction to allow cases that I have been able trigger via piglit tests. Fixes new piglit tests: - arb_gpu_shader_fp64/execution/built-in-functions/fs-sign-sat-neg-abs.shader_test - arb_gpu_shader_fp64/execution/built-in-functions/vs-sign-sat-neg-abs.shader_test Signed-off-by: Ian Romanick <[email protected]> Cc: [email protected] Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/compiler: Silence unused parameter warnings brw_nir.cIan Romanick2018-07-021-1/+1
| | | | | | | | | | | | | | src/intel/compiler/brw_nir.c: In function ‘brw_nir_lower_vue_outputs’: src/intel/compiler/brw_nir.c:464:32: warning: unused parameter ‘is_scalar’ [-Wunused-parameter] bool is_scalar) ^~~~~~~~~ src/intel/compiler/brw_nir.c: In function ‘lower_bit_size_callback’: src/intel/compiler/brw_nir.c:610:57: warning: unused parameter ‘data’ [-Wunused-parameter] lower_bit_size_callback(const nir_alu_instr *alu, void *data) ^~~~ Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/fs: Remove program key argument from generator.Francisco Jerez2018-06-281-1/+1
| | | | | Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Optimize OR with 0 into a MOVIan Romanick2018-06-151-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All of the affected shaders are geometry shaders... the same ones from the similar fs changes. The "No changes on any other platforms" comment below is not quite right. Without the previous change to register coalescing, this optimization caused quite a few regressions in tests that either used gl_ClipVertex or used different interpolation modes. I observed that with both patches applied, glsl-1.10/execution/interpolation/interpolation-none-gl_BackSecondaryColor-smooth-vertex.shader_test was one instruction shorter. I suspect other shaders would be similarly affected. Since this is all based on NOS, shader-db does not reflect it. Haswell total instructions in shared programs: 12954955 -> 12954918 (<.01%) instructions in affected programs: 3603 -> 3566 (-1.03%) helped: 37 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.21% max: 2.50% x̄: 1.99% x̃: 2.50% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -2.30% -1.69% Instructions are helped. total cycles in shared programs: 410012108 -> 410012098 (<.01%) cycles in affected programs: 3540 -> 3530 (-0.28%) helped: 5 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.28% max: 0.28% x̄: 0.28% x̃: 0.28% 95% mean confidence interval for cycles value: -2.00 -2.00 95% mean confidence interval for cycles %-change: -0.28% -0.28% Cycles are helped. Ivy Bridge total instructions in shared programs: 11679387 -> 11679351 (<.01%) instructions in affected programs: 3292 -> 3256 (-1.09%) helped: 36 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.21% max: 2.50% x̄: 2.04% x̃: 2.50% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -2.34% -1.74% Instructions are helped. No changes on any other platforms. Signed-off-by: Ian Romanick <[email protected]>
* i965/vec4: Don't register coalesce into source of VS_OPCODE_UNPACK_FLAGS_SIMD4X2Ian Romanick2018-06-151-0/+9
| | | | | | | This prevents regressions in a bunch of clipping and interpolation tests caused by the next patch (i965/vec4: Optimize OR with 0 into a MOV). Signed-off-by: Ian Romanick <[email protected]>
* intel: activate the gl_BaseVertex loweringAntia Puentes2018-05-021-6/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Surplus code related to the basevertex is removed. The Vertex Elements contain now: * VE 1: <firstvertex, BaseInstance, VertexID, InstanceID> * VE 2: <DrawID, is_indexed_draw, 0, 0> Also fixes unreachable message. Fixes OpenGL CTS tests: * KHR-GL46.shader_draw_parameters_tests.ShaderDrawArraysInstancedParameters * KHR-GL46.shader_draw_parameters_tests.ShaderMultiDrawArraysParameters * KHR-GL46.shader_draw_parameters_tests.MultiDrawArraysIndirectCountParameters * KHR-GL46.shader_draw_parameters_tests.ShaderDrawArraysParameters * KHR-GL46.shader_draw_parameters_tests.ShaderMultiDrawArraysIndirectParameters Fixes Piglit tests: * arb_shader_draw_parameters-drawid-indirect baseinstance * arb_shader_draw_parameters-basevertex Reviewed-by: Jason Ekstrand <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102678
* intel: emit is_indexed_draw in the same VE than gl_DrawIDAntia Puentes2018-05-021-5/+9
| | | | | | | | | | | The Vertex Elements are now: * VE 1: <BaseVertex/firstvertex, BaseInstance, VertexID, InstanceID> * VE 2: <DrawID, is-indexed-draw, 0, 0> VE1 is it kept as it was before, VE2 additionally contains the new system value. Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: Add uses_is_indexed_draw flagAntia Puentes2018-05-021-0/+4
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* intel: Handle firstvertex in an identical way to BaseVertexAntia Puentes2018-04-191-0/+1
| | | | | | | | | | | | Until we set gl_BaseVertex to zero for non-indexed draw calls both have an identical value. The Vertex Elements are kept like that: * VE 1: <BaseVertex/firstvertex, BaseInstance, VertexID, InstanceID> * VE 2: <Draw ID, 0, 0, 0> v2 (idr): Mark nir_intrinsic_load_first_vertex as "unreachable" in emit_system_values_block and fs_visitor::nir_emit_vs_intrinsic.
* intel/compiler: Add a uses_firstvertex flagNeil Roberts2018-04-191-0/+4
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* intel/vec4: Set channel_sizes for MOV_INDIRECT sourcesJason Ekstrand2018-03-301-1/+4
| | | | | | | | | | | Otherwise, any indirect push constant access results in an assertion failure when we start digging through the channel_sizes array. This fixes dEQP-VK.pipeline.push_constant.graphics_pipeline.dynamic_index_vert on Haswell. It should be a harmless no-op for GL since indirect push constants aren't used there. Reviewed-by: Kenneth Graunke <[email protected]> Fixes: e69e5c7006d "i965/vec4: load dvec3/4 uniforms first in the..."
* i965/vec4: Fix null destination register in 3-source instructionsIan Romanick2018-03-261-0/+26
| | | | | | | | | | | | | | | | | | | | | | | A recent commit (see below) triggered some cases where conditional modifier propagation and dead code elimination would cause a MAD instruction like the following to be generated: mad.l.f0 null, ... Matt pointed out that fs_visitor::fixup_3src_null_dest() fixes cases like this in the scalar backend. This commit basically ports that code to the vec4 backend. NOTE: I have sent a couple tests to the piglit list that reproduce this bug *without* the commit mentioned below. This commit fixes those tests. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]> Tested-by: Tapani Pälli <[email protected]> Cc: [email protected] Fixes: ee63933a7 ("nir: Distribute binary operations with constants into bcsel") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105704
* i965: Add negative_equals methodsIan Romanick2018-03-261-0/+7
| | | | | | | | | | | | | | | | | | | | | | This method is similar to the existing ::equals methods. Instead of testing that two src_regs are equal to each other, it tests that one is the negation of the other. v2: Simplify various checks based on suggestions from Matt. Use src_reg::type instead of fixed_hw_reg.type in a check. Also suggested by Matt. v3: Rebase on 3 years. Fix some problems with negative_equals with VF constants. Add fs_reg::negative_equals. v4: Replace the existing default case with BRW_REGISTER_TYPE_UB, BRW_REGISTER_TYPE_B, and BRW_REGISTER_TYPE_NF. Suggested by Matt. Expand the FINISHME comment to better explain why it isn't already finished. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Alejandro Piñeiro <[email protected]> [v3] Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Add infrastructure for generating CSEL instructions.Kenneth Graunke2018-03-081-0/+1
| | | | | | | | | | | | | | | | v2 (idr): Don't allow CSEL with a non-float src2. v3 (idr): Add CSEL to fs_inst::flags_written. Suggested by Matt. v4 (idr): Only set BRW_ALIGN_16 on Gen < 10 (suggested by Matt). Don't reset the access mode afterwards (suggested by Samuel and Matt). Add support for CSEL not modifying the flags to more places (requested by Matt). Signed-off-by: Kenneth Graunke <[email protected]> Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> [v3] Reviewed-by: Matt Turner <[email protected]>
* intel: Drop program size pointer from vec4/fs assembly getters.Kenneth Graunke2018-03-021-3/+2
| | | | | | | | | These days, we're just passing a pointer to a prog_data field, which we already have access to. We can just use it directly. (In the past, it was a pointer to a separate value.) Reviewed-by: Iago Toral Quiroga <[email protected]>
* intel/ir: Allow representing additional flag subregisters in the IR.Francisco Jerez2018-03-021-3/+4
| | | | | | | | | This allows representing conditional mods and predicates on f1.0-f1.1 at the IR level by adding an extra bit to the flag_subreg backend_instruction field. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* compiler: tidy up double_inputs_read usesTimothy Arceri2018-01-301-1/+1
| | | | | | | | | | | | | First we move double_inputs_read into a vs struct in the union, double_inputs_read is only used for vs inputs so this will save space and also allows us to add a new double_inputs field. We add the new field because c2acf97fcc9b changed the behaviour of double_inputs_read, and while it's no longer used to track actual reads in i965 we do still want to track this for gallium drivers. Reviewed-by: Marek Olšák <[email protected]>
* i965: Drop support for the legacy SNORM -> Float equation.Kenneth Graunke2018-01-021-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Older OpenGL defines two equations for converting from signed-normalized to floating point data. These are: f = (2c + 1)/(2^b - 1) (equation 2.2) f = max{c/2^(b-1) - 1), -1.0} (equation 2.3) Both OpenGL 4.2+ and OpenGL ES 3.0+ mandate that equation 2.3 is to be used in all scenarios, and remove equation 2.2. DirectX uses equation 2.3 as well. Intel hardware only supports equation 2.3, so Gen7.5+ systems that use the vertex fetcher hardware to do the conversions always get formula 2.3. This can make a big difference for 10-10-10-2 formats - the 2-bit value can represent 0 with equation 2.3, and cannot with equation 2.2. Ivybridge and older were using equation 2.2 for OpenGL, and 2.3 for ES. Now that Ivybridge supports OpenGL 4.2, this is wrong - we need to use the new rules, at least in core profile. That would leave Gen4-6 doing something different than all other hardware, which seems...lame. With context version promotion, applications that requested a pre-4.2 context may get promoted to 4.2, and thus get the new rules. Zero cases have been reported of this being a problem. However, we've received a report that following the old rules breaks expectations. SuperTuxKart apparently renders the cars red when following equation 2.2, and works correctly when following equation 2.3: https://github.com/supertuxkart/stk-code/issues/2885#issuecomment-353858405 So, this patch deletes the legacy equation 2.2 support entirely, making all hardware and APIs consistently use the new equation 2.3 rules. If we ever find an application that truly requires the old formula, then we'd likely want that application to work on modern hardware, too. We'd likely restore this support as a driconf option. Until then, drop it. This commit will regress Piglit's draw-vertices-2101010 test on pre-Haswell without the corresponding Piglit patch to accept either formula (commit 35daaa1695ea01eb85bc02f9be9b6ebd1a7113a1): draw-vertices-2101010: Accept either SNORM conversion formula. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Combine {VS,FS}_OPCODE_GET_BUFFER_SIZE opcodes.Kenneth Graunke2017-12-301-1/+1
| | | | | | These are the same, we don't need a separate opcode enum per backend. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/vec4: fix splitting of interleaved attributesIago Toral Quiroga2017-11-241-1/+6
| | | | | | | | | | | | | | | | | | | When we split an instruction that reads an uniform value (vstride 0) we need to respect the vstride on the second half of the instruction (that is, the second half should read the same region as the first). We were doing this already, but we didn't account for stages that have interleaved input attributes which also have a vstride of 0 and need the same treatment. Fixes the following on Haswell: KHR-GL45.enhanced_layouts.varying_locations KHR-GL45.enhanced_layouts.varying_array_locations KHR-GL45.enhanced_layouts.varying_structure_locations Reviewed-by: Matt Turner <[email protected]> Acked-by: Andres Gomez <[email protected]>
* intel/compiler: Remove final_program_size from brw_compile_*Jordan Justen2017-10-311-4/+2
| | | | | | | | | The caller can now use brw_stage_prog_data::program_size which is set by the brw_compile_* functions. Cc: Jason Ekstrand <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: add new field for storing program sizeCarl Worth2017-10-311-0/+1
| | | | | | | | | | | | This will be used by the on disk shader cache. v2: * Set in brw_compile_* rather than brw_codegen_*. (Jason) Signed-off-by: Timothy Arceri <[email protected]> [[email protected]: Only add to brw_stage_prog_data] Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: remove unused variableEric Engestrom2017-10-301-3/+0
| | | | | | | | Fixes: 2c873060d3578c7004c0 "i965: Delete unused brw_vs_prog_data::nr_attributes field." Cc: Kenneth Graunke <[email protected]> Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]>
* i965: Delete unused brw_vs_prog_data::nr_attributes field.Kenneth Graunke2017-10-271-1/+0
| | | | Reviewed-by: Matt Turner <[email protected]>
* intel: Allocate prog_data::[pull_]param deeper inside the compilerJason Ekstrand2017-10-121-2/+1
| | | | | | | | | | | | | Now that we're always growing the param array as-needed, we can allocate the param array in common code and stop repeating the allocation everywere. In order to keep things sane, we ralloc the [pull_]param array off of the compile context and then steal it back to a NULL context later. This doesn't get us all the way to where prog_data::[pull_]param is purely an out parameter of the back-end compiler but it gets us a lot closer. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel: Rewrite the world of push/pull paramsJason Ekstrand2017-10-121-10/+7
| | | | | | | | | | | | | | | | | This moves us away to the array of pointers model and onto a model where each param is represented by a generic uint32_t handle. We reserve 2^16 of these handles for builtins that get generated by somewhere inside the compiler and have well-defined meanings. Generic params have handles whose meanings are defined by the driver. The primary downside to this new approach is that it moves a little bit of the work that we would normally do at compile time to draw time. On my laptop this hurts OglBatch6 by no more than 1% and doesn't seem to have any measurable affect on OglBatch7. So, while this may come back to bite us, it doesn't look too bad. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: Use 'class' src_reg, rather than 'struct' src_regMatt Turner2017-08-211-1/+1
| | | | Reviewed-by: Jordan Justen <[email protected]>
* i965: Move brw_reg_type_letters() as wellMatt Turner2017-08-211-2/+2
| | | | | | | And add "to_" to the name for consistency with the other functions in this file. Reviewed-by: Scott D Phillips <[email protected]>
* i965: Reorder brw_reg_type enum valuesMatt Turner2017-08-211-1/+2
| | | | | | | | | | | These vaguely corresponded to the hardware encodings, but that is purely historical at this point. Reorder them so we stop making things "almost work" when mixing enums. The ordering has been closen so that no enum value is the same as a compatible hardware encoding. Reviewed-by: Scott D Phillips <[email protected]>
* i965: Push UBO data, but don't use it just yet.Kenneth Graunke2017-07-131-0/+3
| | | | | | | | | | | This patch starts uploading UBO data via 3DSTATE_CONSTANT_* packets, and updates the compiler to know that there's extra payload data, so things continue working. However, it still issues pull loads for all data. I wanted to separate the two aspects for greater bisectability. v2: Update for new intel_bufferobj_buffer parameter. Reviewed-by: Matt Turner <[email protected]>
* intel: compiler/i965: fix is_broxton checksLionel Landwerlin2017-06-201-1/+1
| | | | | | | | | | In 5f2fe9302c is_geminilake was introduced for the differenciate broxton from geminilake. Unfortunately I failed as verifying that is_broxton is throughout the code base to mean Gen9lp. Fixes: 5f2fe9302c ("intel: common: add flag to identify platforms by name") Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/cnl: Make URB {VS, GS, HS, DS} sizes non multiple of 3Anuj Phogat2017-06-091-2/+9
| | | | | | | | | | v1: By Ben Widawsky <[email protected]> v2: v1 had an assert only for VS. Add the restriction for GS, HS and DS as well and make sure the allocated sizes are not multiple of 3. v3: Move the entry_size checks in to compiler code (Ken) Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: load dvec3/4 uniforms first in the push constant bufferSamuel Iglesias Gonsálvez2017-05-181-27/+80
| | | | | | | | | | | | | | | | | | | | | | | | Reorder the uniforms to load first the dvec4-aligned variables in the push constant buffer and then push the vec4-aligned ones. It takes into account that the relocated uniforms should be aligned to their channel size. This fixes a bug were the dvec3/4 might be loaded one part on a GRF and the rest in next GRF, so the region parameters to read that could break the HW rules. v2: - Fix broken logic. - Add a comment to explain what should be needed to optimise the usage of the push constant buffer slots, as this patch does not pack the uniforms. v3: - Implemented the push constant buffer usage optimization. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Cc: "17.1" <[email protected]> Acked-by: Francisco Jerez <[email protected]>
* i965/vec4: Use NIR to do GS input remappingJason Ekstrand2017-05-091-60/+0
| | | | | | | | We're already doing this in the FS back-end. This just does the same thing in the vec4 back-end. Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: Use NIR remapping for VS attributesJason Ekstrand2017-05-091-36/+24
| | | | | | | | | | | | | The NIR pass already handles remapping system values to attributes for us so we delete the system value code as part of the conversion. We also change nir_lower_vs_inputs to take an explicit inputs_read bitmask and pass in the inputs_read from prog_data instead from pulling it out of NIR. This is because the version in prog_data may get EDGEFLAG added to it on some old platforms. Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler/vs: Move inputs_read handling to generic codeJason Ekstrand2017-05-091-0/+3
| | | | | Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: Set VERT_BIT_EDGEFLAG based on the VUE mapJason Ekstrand2017-05-091-0/+11
| | | | | | | We also add a nice little comment to make it more clear exactly what happens with the edge flag copy. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Set uses_vertexid and friends from brw_compile_vsJason Ekstrand2017-05-091-0/+17
| | | | | Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Embed the shader_info in the nir_shader againJason Ekstrand2017-05-091-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit e1af20f18a86f52a9640faf2d4ff8a71b0a4fa9b changed the shader_info from being embedded into being just a pointer. The idea was that sharing the shader_info between NIR and GLSL would be easier if it were a pointer pointing to the same shader_info struct. This, however, has caused a few problems: 1) There are many things which generate NIR without GLSL. This means we have to support both NIR shaders which come from GLSL and ones that don't and need to have an info elsewhere. 2) The solution to (1) raises all sorts of ownership issues which have to be resolved with ralloc_parent checks. 3) Ever since 00620782c92100d77c660f9783504c6d80fa1d58, we've been using nir_gather_info to fill out the final shader_info. Thanks to cloning and the above ownership issues, the nir_shader::info may not point back to the gl_shader anymore and so we have to do a copy of the shader_info from NIR back to GLSL anyway. All of these issues go away if we just embed the shader_info in the nir_shader. There's a little downside of having to copy it back after calling nir_gather_info but, as explained above, we have to do that anyway. Acked-by: Timothy Arceri <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>