summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* i965/sbe: fix number of inputs for active componentsIago Toral Quiroga2018-03-011-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In 16631ca30ea6 we fixed gen9 active components to account for padded inputs in the URB, which we can have with SSO programs. To do that, instead of going through the bitfield of inputs (which doesn't include padding information), we compute the number of inputs from the size of the URB entry. Unfortunately, there are some special inputs that are not stored in the URB and that we also need to account for. These special inputs are identified and handled during calculate_attr_overrides(). Instead of keeping track of the exact number of inputs, we just program active components for all possible inputs like we do in anvil. This fixes a regression in a WebGL program that uses Point Sprite functionality (specifically, VARYING_SLOT_PNTC). v2: - Add 'Fixes' tag (Mark Janes) - make no_vue_inputs int instead of uint32_t, and add const qualifier to num_inputs variable (Ian) v3: - Do not try to count inputs correctly, just program all input slots like we do in anvil (Ken) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105224 Fixes: 16631ca30ea6 (i965/sbe: fix active components for SSO programs with over 16 inputs) Reviewed-by: Kenneth Graunke <[email protected]>
* radv: only emit cache flushes when the pool size is large enoughSamuel Pitoiset2018-03-013-11/+15
| | | | | | | | This is an optimization which reduces the number of flushes for small pool buffers. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: keep track of the query pool sizeSamuel Pitoiset2018-03-012-5/+5
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: make sure to emit cache flushes before starting a querySamuel Pitoiset2018-03-013-7/+33
| | | | | | | | | | | If the query pool has been previously resetted using the compute shader path. Fixes: a41e2e9cf5 ("radv: allow to use a compute shader for resetting the query pool") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105292 Cc: "18.0" <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir/serialize: handle var->name being NULLAlejandro Piñeiro2018-03-011-1/+2
| | | | | | | | var->name could be NULL under ARB_gl_spirv for example. And in any case, the code is already handing var name being NULL when reading a variable, so it is consistent to do it writing a variable too. Reviewed-by: Timothy Arceri <[email protected]>
* anv: Enable VK_KHR_16bit_storage for PushConstantJose Maria Casanova Crespo2018-02-281-1/+1
| | | | | | Enables storagePushConstant16 features of VK_KHR_16bit_storage for Gen8+. Reviewed-by: Jason Ekstrand <[email protected]>
* spirv/i965/anv: Relax push constant offset assertions being 32-bit alignedJose Maria Casanova Crespo2018-02-283-9/+10
| | | | | | | | | | | | | | | | The introduction of 16-bit types with VK_KHR_16bit_storages implies that push constant offsets could be multiple of 2-bytes. Some assertions are updated so offsets should be just multiple of size of the base type but in some cases we can not assume it as doubles aren't aligned to 8 bytes in some cases. For 16-bit types, the push constant offset takes into account the internal offset in the 32-bit uniform bucket adding 2-bytes when we access not 32-bit aligned elements. In all 32-bit aligned cases it just becomes 0. v2: Assert offsets to be aligned to the dest type size. (Jason Ekstrand) Reviewed-by: Jason Ekstrand <[email protected]>
* spirv: Calculate properly 16-bit vector sizesJose Maria Casanova Crespo2018-02-281-5/+2
| | | | | | | | | | | Range in 16-bit push constants load was being calculated wrongly using 4-bytes per element instead of 2-bytes as it should be. v2: Use glsl_get_bit_size instead of if statement (Jason Ekstrand) Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Enable VK_KHR_16bit_storage for SSBO and UBOJose Maria Casanova Crespo2018-02-282-3/+4
| | | | | | | Enables storageBuffer16BitAccess and uniformAndStorageBuffer16BitAccesss features of VK_KHR_16bit_storage for Gen8+. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Support 16-bit store_ssbo with VK_KHR_relaxed_block_layoutJose Maria Casanova Crespo2018-02-281-7/+15
| | | | | | | | | | | | | | | | Restrict the use of untyped_surface_write with 16-bit pairs in ssbo to the cases where we can guarantee that offset is multiple of 4. Taking into account that VK_KHR_relaxed_block_layout is available in ANV we can only guarantee that when we have a constant offset that is multiple of 4. For non constant offsets we will always use byte_scattered_write. v2: (Jason Ekstrand) - Assert offset_reg to be multiple of 4 if it is immediate. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Support 16-bit do_read_vector with VK_KHR_relaxed_block_layoutJose Maria Casanova Crespo2018-02-281-14/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | 16-bit load_ubo/ssbo operations that call do_untyped_read_vector don't guarantee that offsets are multiple of 4-bytes as required by untyped_read message. This happens for example in the case of f16mat3x3 when then VK_KHR_relaxed_block_layout is enabled. Vectors reads when we have non-constant offsets are implemented with multiple byte_scattered_read messages that not require 32-bit aligned offsets. Now for all constant offsets we can use the untyped_read_surface message. In the case of constant offsets not aligned to 32-bits, we calculate a start offset 32-bit aligned and use the shuffle_32bit_load_result_to_16bit_data function and the first_component parameter to skip the copy of the unneeded component. v2: (Jason Ekstrand) Use untyped_read_surface messages always we have constant offsets. v3: (Jason Ekstrand) Simplify loop for reads with non constant offsets. Use end - start to calculate the number of 32-bit components to read with constant offsets. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: shuffle_32bit_load_result_to_16bit_data now skips componentsJose Maria Casanova Crespo2018-02-283-3/+6
| | | | | | | | | | | | | | | | This helper used to load 16bit components from 32-bits read now allows skipping components with the new parameter first_component. The semantics now skip components until we reach the first_component, and then reads the number of components passed to the function. All previous uses of the helper are updated to use 0 as first_component. This will allow read 16-bit components when the first one is not aligned 32-bit. Enabling more usages of untyped_reads with 16-bit types. v2: (Jason Ektrand) Change parameters order to first_component, num_components Reviewed-by: Jason Ekstrand <[email protected]>
* isl/i965/fs: SSBO/UBO buffers need size padding if not multiple of 32-bitJose Maria Casanova Crespo2018-02-283-2/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The surfaces that backup the GPU buffers have a boundary check that considers that access to partial dwords are considered out-of-bounds. For example, buffers with 1,3 16-bit elements has size 2 or 6 and the last two bytes would always be read as 0 or its writting ignored. The introduction of 16-bit types implies that we need to align the size to 4-bytew multiples so that partial dwords could be read/written. Adding an inconditional +2 size to buffers not being multiple of 2 solves this issue for the general cases of UBO or SSBO. But, when unsized arrays of 16-bit elements are used it is not possible to know if the size was padded or not. To solve this issue the implementation calculates the needed size of the buffer surfaces, as suggested by Jason: surface_size = isl_align(buffer_size, 4) + (isl_align(buffer_size, 4) - buffer_size) So when we calculate backwards the buffer_size in the backend we update the resinfo return value with: buffer_size = (surface_size & ~3) - (surface_size & 3) It is also exposed this buffer requirements when robust buffer access is enabled so these buffer sizes recommend being multiple of 4. v2: (Jason Ekstrand) Move padding logic fron anv to isl_surface_state. Move calculus of original size from spirv to driver backend. v3: (Jason Ekstrand) Rename some variables and use a similar expresion when calculating. padding than when obtaining the original buffer size. Avoid use of unnecesary component call at brw_fs_nir. v4: (Jason Ekstrand) Complete comment with buffer size calculus explanation in brw_fs_nir. Reviewed-by: Jason Ekstrand <[email protected]>
* vbo: Remove vbo_save_vertex_list::vertex_size.Mathias Fröhlich2018-03-012-9/+6
| | | | | | | | Like before use local variables from compile_vertex_list instead. Remove vertex_size from struct vbo_save_vertex_list. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
* vbo: Remove vbo_save_vertex_list::buffer_offset.Mathias Fröhlich2018-03-012-29/+13
| | | | | | | | | | The buffer_offset is used in aligned_vertex_buffer_offset. But now that most of these decisions are done in compile_vertex_list we can work on local variables instead of struct members in the display list code. Clean that up and remove buffer_offset. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
* vbo: Remove vbo_save_vertex_list::start_vertex.Mathias Fröhlich2018-03-013-6/+2
| | | | | | | | Replace last use on replay with _vbo_save_get_{min,max}_index. Appart from that it is not used anymore. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
* vbo: Remove vbo_save_vertex_list::attrsz.Mathias Fröhlich2018-03-012-7/+4
| | | | | | | | Is not used anymore on replay, move the last use in display list compilation to the original array in the display list compiler. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
* vbo: Remove vbo_save_vertex_list::attrtype.Mathias Fröhlich2018-03-012-4/+1
| | | | | | | | Is not used anymore on replay, move the last use in display list compilation to the original array in the display list compiler. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
* vbo: Remove vbo_save_vertex_list::enabled.Mathias Fröhlich2018-03-012-3/+1
| | | | | | | Is not used anymore on replay. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
* vbo: Remove reference to the vertex_store from the dlist node.Mathias Fröhlich2018-03-013-21/+10
| | | | | | | | Since we now store a set of VAOs in the display list, use these object to get the reference to the VBO in several places. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
* vbo: Implement current values update in terms of the VAO.Mathias Fröhlich2018-03-013-62/+47
| | | | | | | | | Use the information already present in the VAO to update the current values after display list replay. Set GL_OUT_OF_MEMORY on allocation failure for the current value update storage. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
* vbo: Implement vbo_loopback_vertex_list in terms of the VAO.Mathias Fröhlich2018-03-014-93/+151
| | | | | | | | | | | Use the information already present in the VAO to replay a display list node using immediate mode draw commands. Use a hand full of helper methods that will be useful for the next patches also. v2: Insert asserts, constify local variables. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
* vbo: Use a local variable for the dlist offsets.Mathias Fröhlich2018-03-012-9/+7
| | | | | | | | The master value is now stored inside the VAO already present in struct vbo_save_vertex_list. Remove the unneeded copy from dlist storage. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
* vbo: Remove unused vbo_save_context::wrap_count.Mathias Fröhlich2018-03-011-1/+0
| | | | | Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
* vbo: Remove unused vbo_save_vertex_list::dangling_attr_ref.Mathias Fröhlich2018-03-012-3/+0
| | | | | Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Fröhlich <[email protected]>
* anv: Always set has_context_priorityJason Ekstrand2018-02-281-3/+1
| | | | | | | | | | We don't zalloc the physical device so we need to unconditionally set everything. Crucible helpfully initializes all allocations to 139 so it was getting true regardless of whether or not the kernel actually supports context priorities. Fixes: 6d8ab53303331 "anv: implement VK_EXT_global_priority extension" Reviewed-by: Kenneth Graunke <[email protected]>
* Revert "i965: Only emit 3DSTATE_DRAWING_RECTANGLE once on gen8+"Mark Janes2018-02-283-13/+2
| | | | | | | | | | | | | This reverts commit a2c1e48f15995a826dc759e064c2603882a37e0c. On BDWGT3e and KBLGT3e systems, this commit regressed the following tests: piglit.spec.ext_framebuffer_multisample.accuracy 2 stencil_resolve small depthstencil piglit.spec.ext_framebuffer_multisample.accuracy 4 stencil_resolve small depthstencil piglit.spec.ext_framebuffer_multisample.accuracy 6 stencil_resolve small depthstencil piglit.spec.ext_framebuffer_multisample.accuracy 8 stencil_resolve small depthstencil piglit.spec.ext_framebuffer_multisample.accuracy all_samples stencil_resolve small depthstencil
* radeonsi/nir: increase values to 8 for gs fetch.Dave Airlie2018-03-011-1/+1
| | | | | | | | This stops a crash when running (still fails): tests/spec/arb_gpu_shader_fp64/execution/explicit-location-gs-fs-vs.shader_test Reviewed-by: Timothy Arceri <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: Use the syncobj wait ioctl to wait on fences if possible.Bas Nieuwenhuizen2018-03-013-9/+26
| | | | | | Handles the !waitAll and signal after the start of the wait cases correctly. Reviewed-by: Dave Airlie <[email protected]>
* radv: Implement more efficient !waitAll fence waiting.Bas Nieuwenhuizen2018-03-013-0/+75
| | | | Reviewed-by: Dave Airlie <[email protected]>
* radv: Implement waiting on non-submitted fences.Bas Nieuwenhuizen2018-03-011-2/+11
| | | | | Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver" Reviewed-by: Dave Airlie <[email protected]>
* radv: Implement WaitForFences with !waitAll.Bas Nieuwenhuizen2018-03-011-5/+15
| | | | | | | | | | Nothing to do except using a busy wait loop. At least for old kernels. A better implementation for newer kernels to come later. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105255 Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver" Reviewed-by: Dave Airlie <[email protected]>
* ac/nir: fix shared atomic operations.Dave Airlie2018-03-011-5/+5
| | | | | | | | | | | The nir->llvm conversion was using the wrong srcs. Fixes: tests/spec/arb_compute_shader/execution/shared-atomics.shader_test Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* ac/nir: don't apply slice rounding on txf_msDave Airlie2018-03-011-1/+1
| | | | | | | | | | | This matches the tgsi code. Fixes arb_texture_multisample texelFetch piglit tests. Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Fixes: f4e499ec7914 (radv: add initial non-conformant radv vulkan driver) Signed-off-by: Dave Airlie <[email protected]>
* radeonsi: set some context vars for nir pathTimothy Arceri2018-03-011-6/+10
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium: remove llvm from ir structTimothy Arceri2018-03-011-1/+0
| | | | | | | This was added in 425dc4c4b366 but never used. Also since 100796c15c3a native has superseded llvm. Acked-by: Dave Airlie <[email protected]>
* i965: Don't emit MOVs with undefined registers for Gen4 point clipping.Kenneth Graunke2018-02-281-1/+1
| | | | | | | | | | | | Gen4 point clipping calls brw_clip_tri_alloc_regs with nr_verts == 0, which means that c->reg.vertex[] isn't initialized. It then emits MOVs to stomp components of those uninitialized registers to 0. This started causing assertions after Matt's recent series, when those uninitialized registers started getting BRW_REGISTER_TYPE_NF, which definitely doesn't exist on Gen4-5. Reviewed-by: Matt Turner <[email protected]>
* broadcom/vc5: Fix regression in the page-cache slice size alignment.Eric Anholt2018-02-281-3/+6
| | | | | | | We need to align the size of the slice, not the offset of the next slice. Fixes KHR-GLES3.texture_repeat_mode.rgba32ui_11x131_2_clamp_to_edge. Fixes: b4b4ada7616d ("broadcom/vc5: Fix layout of 3D textures.")
* i965: Only emit 3DSTATE_DRAWING_RECTANGLE once on gen8+Jason Ekstrand2018-02-283-2/+13
| | | | | Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Be more clever about setting up our viewport clipJason Ekstrand2018-02-281-8/+12
| | | | | | | | | | | | Before, we were trusting in the hardware to take the intersection of the viewport clip with the drawing rectangle. Unfortunately, 3DSTATE_DRAWING_RECTANGLE is fairly expensive because it implicitly does a full pipeline stall. If we're a bit more careful with our viewport clipping, we can just re-emit it once at context creation time. Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler: Re-add .vs_inputs_dual_locations = trueMatt Turner2018-02-281-0/+1
| | | | | | Looks like a rebase mistake. Fixes: 89fe5190a256 ("intel/compiler: Lower flrp32 on Gen11+")
* r600/shader: when using images always load thread id gpr at start (v2)Dave Airlie2018-02-281-15/+7
| | | | | | | | | | | | The delayed loading code was fail if we had control flow. This fixes: tests/spec/arb_shader_image_load_store/execution/image_checkerboard.shader_test v2: don't use temp_reg before setting temp_reg up. Tested-by: Gert Wollny <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: fix whitespace in recent 1d texture commit.Dave Airlie2018-02-281-1/+1
| | | | trivial fix.
* intel/compiler: Add ICL to test_eu_validate.cppMatt Turner2018-02-281-0/+1
| | | | | | | With the Align16 tests now disabled, we can run the rest of the tests in ICL mode (and see them pass!) Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler: Disable Align16 tests on Gen11+Matt Turner2018-02-281-0/+16
| | | | | | Align16 is no more. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler: Add instruction compaction support on Gen11Matt Turner2018-02-281-0/+42
| | | | | | Gen11 only differs from SKL+ in that it uses a new datatype index table. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler: Mark line, pln, and lrp as removed on Gen11+Matt Turner2018-02-281-4/+6
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler: Lower flrp32 on Gen11+Matt Turner2018-02-285-17/+26
| | | | | | The LRP instruction is no more. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler/fs: Implement ddy without using align16 for Gen11+Matt Turner2018-02-281-8/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Align16 is no more. We previously generated an align16 ADD instruction to calculate DDY: add(16) g25<1>F -g23<4>.xyxyF g23<4>.zwzwF { align16 1H }; Without align16, we now implement it as: add(4) g25<1>F -g23<0,2,1>F g23.2<0,2,1>F { align1 1N }; add(4) g25.4<1>F -g23.4<0,2,1>F g23.6<0,2,1>F { align1 1N }; add(4) g26<1>F -g24<0,2,1>F g24.2<0,2,1>F { align1 1N }; add(4) g26.4<1>F -g24.4<0,2,1>F g24.6<0,2,1>F { align1 1N }; where only the first two instructions are needed in SIMD8 mode. Note: an earlier version of the patch implemented this in two instructions in SIMD16: add(8) g25<2>F -g23<4,2,0>F g23.2<4,2,0>F { align1 1N }; add(8) g25.1<2>F -g23.1<4,2,0>F g23.3<4,2,0>F { align1 1N }; but I realized that the channel enable bits will not be correct. If we knew we were under uniform control flow, we could emit only those two instructions however. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler/fs: Simplify ddx/ddy code generationMatt Turner2018-02-281-42/+21
| | | | | | The brw_reg() constructor just obfuscates things here, in my opinion. Reviewed-by: Kenneth Graunke <[email protected]>