summaryrefslogtreecommitdiffstats
path: root/src/intel
Commit message (Collapse)AuthorAgeFilesLines
* i965/vec4: load dvec3/4 uniforms first in the push constant bufferSamuel Iglesias Gonsálvez2017-05-181-27/+80
| | | | | | | | | | | | | | | | | | | | | | | | Reorder the uniforms to load first the dvec4-aligned variables in the push constant buffer and then push the vec4-aligned ones. It takes into account that the relocated uniforms should be aligned to their channel size. This fixes a bug were the dvec3/4 might be loaded one part on a GRF and the rest in next GRF, so the region parameters to read that could break the HW rules. v2: - Fix broken logic. - Add a comment to explain what should be needed to optimise the usage of the push constant buffer slots, as this patch does not pack the uniforms. v3: - Implemented the push constant buffer usage optimization. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Cc: "17.1" <[email protected]> Acked-by: Francisco Jerez <[email protected]>
* i965/vec4: fix swizzle and writemask when loading an uniform with constant ↵Samuel Iglesias Gonsálvez2017-05-181-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | offset It was setting XYWZ swizzle and writemask to all uniforms, no matter if they were a vector or scalar, so this can lead to problems when loading them to the push constant buffer. Moreover, 'shift' calculation was designed to calculate the offset in DWORDS, but it doesn't take into account DFs, so the calculated swizzle for the later ones was wrong. The indirect case is not changed because MOV INDIRECT will write to all components. Added an assert to verify that these uniforms are aligned. v2: - Fix 'shift' calculation (Curro) - Set both swizzle and writemask. - Add assert(shift == 0) for the indirect case. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Cc: "17.1" <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965/vec4/gs: restore the uniform values which was overwritten by failed ↵Samuel Iglesias Gonsálvez2017-05-181-0/+26
| | | | | | | | | | | | | | | | | vec4_gs_visitor execution We are going to add a packing feature to reduce the usage of the push constant buffer. One of the consequences is that 'nr_params' would be modified by vec4_visitor's run call, so we need to restore it if one of them failed before executing the fallback ones. Same thing happens to the uniforms values that would be reordered afterwards. Fixes GL45-CTS.arrays_of_arrays_gl.InteractionFunctionCalls2 when the dvec4 alignment and packing patch is applied. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Cc: "17.1" <[email protected]> Acked-by: Francisco Jerez <[email protected]>
* Android: correct libz dependencyChih-Wei Huang2017-05-171-1/+1
| | | | | | | | | | | | | | | | | | | Commit 6facb0c0 ("android: fix libz dynamic library dependencies") unconditionally adds libz as a dependency to all shared libraries. That is unnecessary. Commit 85a9b1b5 introduced libz as a dependency to libmesa_util. So only the shared libraries that use libmesa_util need libz. Fix Android Lollipop build by adding the include path of zlib to libmesa_util explicitly instead of getting the path implicitly from zlib since it doesn't export the include path in Lollipop. Fixes: 6facb0c0 "android: fix libz dynamic library dependencies" Signed-off-by: Chih-Wei Huang <[email protected]> Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Rob Herring <[email protected]>
* intel/isl/gen6: Fix combined depth stencil alignmentJason Ekstrand2017-05-161-7/+7
| | | | | | | | All combined depth stencil buffers (even those with just stencil) require a 4x4 alignment on Sandy Bridge. The only depth/stencil buffer type that requires 4x2 is separate stencil. Reviewed-by: Chad Versace <[email protected]>
* intel/isl: Refactor gen8_choose_image_alignment_elJason Ekstrand2017-05-161-140/+49
| | | | | Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* intel/isl: Refactor gen6_choose_image_alignment_elJason Ekstrand2017-05-161-18/+14
| | | | | Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* intel/isl: Refactor gen7_choose_image_alignment_elJason Ekstrand2017-05-161-83/+74
| | | | | | | | | | | | | | | The Ivy Bridge PRM provides a nice table that handles most of the alignment cases in one place. For standard color buffers we have a little freedom of choice but for most depth, stencil and compressed it's hard-coded. Chad's original functions split halign and valign apart and implemented them almost entirely based on restrictions and not the table. This makes things way more confusing than they need to be. This commit gets rid of the split and makes us implement the exact table up-front. If our surface isn't one of the ones in the table then we have to make real choices. Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* intel/isl/gen7: Use stencil vertical alignment of 8 instead of 4Pohjolainen, Topi2017-05-161-23/+5
| | | | | | | | | | | | | | | | | | | | | The reasoning Chad gave in the comment for choosing a valign of 4 is entirely bunk. The fact that you have to multiply pitch by 2 is completely unrelated to the halign/valign parameters used for texture layout. (Not completely unrelated. W-tiling is just Y-tiling with a bit of extra swizzling which turns 8x8 W-tiled chunks into 16x4 y-tiled chunks so it makes everything easier if miplevels are always aligned to 8x8.) The fact that RENDER_SURFACE_STATE::SurfaceVerticalAlignmet doesn't have a VALIGN_8 option doesn't matter since this is gen7 and you can't do stencil texturing anyway. v2 (Jason Ekstrand): - Delete most of Chad's comment and add a more descriptive commit message. Signed-off-by: Topi Pohjolainen <[email protected]> Cc: "17.0 17.1" <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965: Fix test_eu_validate.cppMatt Turner2017-05-161-1/+1
| | | | | | | Broken by commit a7217e909ce6 ("i965: Pass pointer and end of assembly to brw_validate_instructions"). Reported-by: Aaron Watry <[email protected]>
* anv: Implement VK_KHR_get_surface_capabilities2Jason Ekstrand2017-05-163-0/+32
| | | | Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* intel/aubinator_error_decode: Disassemble shader programsMatt Turner2017-05-152-4/+186
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* intel/aubinator_error_decode: Stop decoding after MI_BATCH_BUFFER_ENDMatt Turner2017-05-151-0/+3
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* intel/tools: Refactor gen_disasm_disassemble() to use annotationsMatt Turner2017-05-151-28/+43
| | | | | | | | Which will allow us to print validation errors found in shader assembly in GPU hang error states. Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* intel/decoder: Fix indentationMatt Turner2017-05-151-4/+4
| | | | Reviewed-by: Iago Toral Quiroga <[email protected]>
* genxml: Remove brackets from kernel start pointer namesMatt Turner2017-05-152-7/+7
| | | | | | | | Newer Gens' names don't have the brackets. Having common names will make some later patches simpler. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
* i965: Add a weak no-op nir_print_instr() symbolMatt Turner2017-05-151-0/+2
| | | | | | | | | | | | | intel_asm_annotation.c is part of libintel_compiler.la, which contains code for disassembling and validating shaders that we want to call in aubinator_error_decode. dump_assembly() calls nir_print_instr() to print annotations, and although dump_assembly() is not called by aubinator_error_decode (nor is any function in intel_asm_annotation.c) it causes undefined references to nir_print_instr(). To work around, provide a no-op weak symbol to resolve against.
* i965: Allow brw_eu_validate to handle compact instructionsMatt Turner2017-05-151-2/+15
| | | | | | | | This will allow the validator to run on shader programs we find in the GPU hang error state. Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Pass pointer and end of assembly to brw_validate_instructionsMatt Turner2017-05-155-11/+22
| | | | | | | This will allow us to more easily run brw_validate_instructions() on shader programs we find in GPU hang error states. Reviewed-by: Iago Toral Quiroga <[email protected]>
* intel: gen-decoder: fix xml parser leakLionel Landwerlin2017-05-151-6/+7
| | | | | | | | In the unlikely case the parsing of genxml files fails, we were leaking an xml parser object. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* genxml: Add alias for MOCS.Rafael Antognolli2017-05-115-0/+5
| | | | | | | | Use an alias for this field on 3DSTATE_INDEX_BUFFER on gen6+, so we can set the same value as the defines. Signed-off-by: Rafael Antognolli <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* i965: Port Gen4-5 VS_STATE to genxml.Kenneth Graunke2017-05-113-3/+3
| | | | | | It's actually not that much code. Reviewed-by: Rafael Antognolli <[email protected]>
* genxml: Fix KSPs on Ironlake to be offsets, not pointers.Kenneth Graunke2017-05-111-8/+8
| | | | | | | We use Instruction State Base Address on Ironlake, so we want KSP to be an offset not an actual pointer. Gen4/G45 use pointers. Reviewed-by: Jason Ekstrand <[email protected]>
* anv: document that anv_gem_mmap returns MAP_FAILED on errorEmil Velikov2017-05-111-1/+1
| | | | | Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* i965: Drop INTEL_DEBUG=stats.Kenneth Graunke2017-05-102-2/+1
| | | | | | | | | | | | | | | | For whatever reason, we had an INTEL_DEBUG=stats option that enabled various statistics counters on Gen4-5 systems. It's been around forever, though I can't think of a single time that it's been useful. On Gen6+, we enable statistics all the time because they're necessary to support various query object targets. Turning them off would break those queries. Gen4-5 don't support those queries, so the statistics counters generally aren't useful; we disabled them by default. This patch disables them altogether. Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* anv: don't leak DRM devicesGrazvydas Ignotas2017-05-101-0/+1
| | | | | | | | | After successful drmGetDevices2() call, drmFreeDevices() needs to be called. Fixes: b1fb6e8d "anv: do not open random render node(s)" Signed-off-by: Grazvydas Ignotas <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> # radv version
* anv: fix possible stack corruptionGrazvydas Ignotas2017-05-101-1/+1
| | | | | | | | | | drmGetDevices2 takes count and not size. Probably hasn't caused problems yet in practice and was missed as setups with more than 8 DRM devices are not very common. Fixes: b1fb6e8d "anv: do not open random render node(s)" Signed-off-by: Grazvydas Ignotas <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* i965/vec4: Delete the system value infastructureJason Ekstrand2017-05-0911-137/+5
| | | | | | | | | The only thing still using it is INVOCATION_ID for geometry shaders. That's easily enough inlined into the nir_intrinsic_load_invocation_id handling code. Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: Use NIR to do GS input remappingJason Ekstrand2017-05-099-101/+59
| | | | | | | | We're already doing this in the FS back-end. This just does the same thing in the vec4 back-end. Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Move remapping of gl_PointSize to the NIR levelJason Ekstrand2017-05-092-26/+21
| | | | | Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/nir: Inline remap_inputs_with_vue_mapJason Ekstrand2017-05-091-27/+22
| | | | | Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: Use NIR remapping for VS attributesJason Ekstrand2017-05-096-121/+34
| | | | | | | | | | | | | The NIR pass already handles remapping system values to attributes for us so we delete the system value code as part of the conversion. We also change nir_lower_vs_inputs to take an explicit inputs_read bitmask and pass in the inputs_read from prog_data instead from pulling it out of NIR. This is because the version in prog_data may get EDGEFLAG added to it on some old platforms. Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler/vs: Move inputs_read handling to generic codeJason Ekstrand2017-05-092-3/+3
| | | | | Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: Set VERT_BIT_EDGEFLAG based on the VUE mapJason Ekstrand2017-05-091-0/+11
| | | | | | | We also add a nice little comment to make it more clear exactly what happens with the edge flag copy. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Lower gl_VertexID and friends to inputs at the NIR levelJason Ekstrand2017-05-094-70/+74
| | | | | | | | | NIR calls these system values but they come in from the VF unit as vertex data. It's terribly convenient to just be able to treat them as such in the back-end. Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Set uses_vertexid and friends from brw_compile_vsJason Ekstrand2017-05-093-11/+17
| | | | | Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Move multiply by 4 for VS ATTR setup into the scalar backend.Jason Ekstrand2017-05-092-2/+2
| | | | | | | | | | The vec4 backend will want to count in units of vec4s, not scalar components. The simplest solution is to move the multiplication by 4 into the scalar backend. This also improves consistency with how we count varyings. Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/nir: Inline remap_vs_attrsJason Ekstrand2017-05-091-30/+26
| | | | | | | | | | Now that we have nice block iterators, there's no good reason for this to be off on it's own. While we're here, we convert to using the NIR const index getters/setters instead of whacking const_index values directly. Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Embed the shader_info in the nir_shader againJason Ekstrand2017-05-0917-155/+155
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit e1af20f18a86f52a9640faf2d4ff8a71b0a4fa9b changed the shader_info from being embedded into being just a pointer. The idea was that sharing the shader_info between NIR and GLSL would be easier if it were a pointer pointing to the same shader_info struct. This, however, has caused a few problems: 1) There are many things which generate NIR without GLSL. This means we have to support both NIR shaders which come from GLSL and ones that don't and need to have an info elsewhere. 2) The solution to (1) raises all sorts of ownership issues which have to be resolved with ralloc_parent checks. 3) Ever since 00620782c92100d77c660f9783504c6d80fa1d58, we've been using nir_gather_info to fill out the final shader_info. Thanks to cloning and the above ownership issues, the nir_shader::info may not point back to the gl_shader anymore and so we have to do a copy of the shader_info from NIR back to GLSL anyway. All of these issues go away if we just embed the shader_info in the nir_shader. There's a little downside of having to copy it back after calling nir_gather_info but, as explained above, we have to do that anyway. Acked-by: Timothy Arceri <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel: compiler: prevent integer overflowLionel Landwerlin2017-05-091-2/+2
| | | | | | | CID: 1399477, 1399478 (Integer handling issues) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel: compiler: remove duplicated codeLionel Landwerlin2017-05-091-12/+0
| | | | | | | CID: 1399470: (Control flow issues) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel: gen decoder: don't check for size_t negative valuesLionel Landwerlin2017-05-091-1/+1
| | | | | | | | | We should get either 0 or 1 here. CID: 1373562 (Control flow issues) Signed-off-by: Lionel Landwerlin <[email protected]> Acked-by: Matt Turner <[email protected]>
* anv: check return value of anv_execbuf_add_boLionel Landwerlin2017-05-081-2/+7
| | | | | | | CID: 1405919 (Error handling issues) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
* anv: avoid null pointer dereferenceLionel Landwerlin2017-05-081-1/+2
| | | | | | | | | The application might not give an output structure. CID: 1405765 (Null pointer dereferences) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
* anv/allocator: Only write to _vg_ptr if we have valgrindJason Ekstrand2017-05-051-1/+1
| | | | | | | This fixes the build when not building against valgrind headers. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100945 Reviewed-by: Chad Versace <[email protected]>
* anv/query: handle more cases of 'out of host memory'Iago Toral Quiroga2017-05-051-0/+10
| | | | Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* anv/allocator: Improve block pool growing assertsJason Ekstrand2017-05-041-6/+5
| | | | Reviewed-by: Juan A. Suarez Romero <[email protected]>
* anv: Drop the instruction pool block sizeJason Ekstrand2017-05-041-2/+1
| | | | | | | Now that we can allocate states larger than the block size, we no longer need a block size of 1MB which can be rather wasteful. Reviewed-by: Juan A. Suarez Romero <[email protected]>
* anv/allocator: Add support for large stream allocationsJason Ekstrand2017-05-041-4/+7
| | | | Reviewed-by: Juan A. Suarez Romero <[email protected]>
* anv/allocator: Allow state pools to allocate large statesJason Ekstrand2017-05-041-0/+69
| | | | | | | | | | | | | | | | | | | | | | Previously, the maximum size of a state that could be allocated from a state pool was a block. However, this has caused us various issues particularly with shaders which are potentially very large. We've also hit issues with render passes with a large number of attachments when we go to allocate the block of surface state. This effectively removes the restriction on the maximum size of a single state. (There's still a limit of 1MB imposed by a fixed-length bucket array.) For states larger than the block size, we just grab a large block off of the block pool rather than sub-allocating. When we go to allocate some chunk of state and the current bucket does not have state, we try to pull a chunk from some larger bucket and split it up. This should improve memory usage if a client occasionally allocates a large block of state. This commit is inspired by some similar work done by Juan A. Suarez Romero <[email protected]>. Reviewed-by: Juan A. Suarez Romero <[email protected]>