summaryrefslogtreecommitdiffstats
path: root/src/intel
Commit message (Collapse)AuthorAgeFilesLines
* anv: only expose up to 28 vertex attributesIago Toral Quiroga2017-07-261-1/+1
| | | | | | | | | | | | The EU limit of 128 GRFs should allow 32 vertex elements of 4 GRFs. However, the maximum allowed value of "Vertex URB Entry Read Length" in SIMD8 is 15. And 15 * 8 = 120 gives us a limit of 30 vertex elements. Because we also need to reserve a vertex buffer to upload VertexIndex/InstanceIndex and another to upload DrawID when needed, we can only expose 28. Cc: "17.2" <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv/cmd_buffer: fix off by one error in assertionIago Toral Quiroga2017-07-261-1/+1
| | | | | Cc: "17.2" <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/decoder: Reuse the gen_make_gen() helper.Eric Anholt2017-07-251-3/+1
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/decoder: Reuse the MAX2 macro instead of defining another one.Eric Anholt2017-07-251-3/+1
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/blorp: ship blorp_genX_exec.h within the tarballEmil Velikov2017-07-241-0/+1
| | | | | Fixes: c9cb37b2a6c ("intel/blorp: Add a partial resolve pass for MCS") Signed-off-by: Emil Velikov <[email protected]>
* anv/image: zalloc image viewsJason Ekstrand2017-07-221-7/+1
| | | | | | This allows us to avoid some extra zeroing. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv/image: Use vk_zalloc instead of an explicit memsetJason Ekstrand2017-07-221-3/+2
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Separate surface states by layout instead of aux_usageJason Ekstrand2017-07-224-53/+58
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/isl: Add some sanity checks for compressed surfacesJason Ekstrand2017-07-221-0/+18
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/isl: Add a helper to get a subimage surfaceJason Ekstrand2017-07-223-30/+76
| | | | | | | We already have a helper for doing this in BLORP, this just moves the logic into ISL where we can share it with other components. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Get rid of some unused function declarationsJason Ekstrand2017-07-221-7/+0
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/isl: Add a helper for determining if a color is 0/1Jason Ekstrand2017-07-222-0/+30
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* intel/blorp: Allow blorp_copy on sRGB formatsJason Ekstrand2017-07-221-2/+16
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* intel/isl/format: Add an srgb_to_linear helperJason Ekstrand2017-07-222-1/+53
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* intel/isl/format: Dedent the template in gen_format_layout.pyJason Ekstrand2017-07-221-58/+57
| | | | | | | This makes it much easier to edit the template and doesn't really dirty the python all that much. Reviewed-by: Topi Pohjolainen <[email protected]>
* intel/isl: Add an aux state for "partial clear"Jason Ekstrand2017-07-221-35/+53
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* intel/blorp: Add a partial resolve pass for MCSJason Ekstrand2017-07-224-1/+213
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* anv: Predicate fast-clear resolvesNanley Chery2017-07-223-16/+120
| | | | | | | | | | | | | | | Image layouts only let us know that an image *may* be fast-cleared. For this reason we can end up with redundant resolves. Testing has shown that such resolves can measurably hurt performance and that predicating them can avoid the penalty. v2: - Introduce additional resolve state management function (Jason Ekstrand). - Enable easy retrieval of fast clear state fields. v3: Use more descriptive field enums (Jason) Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/blorp: Allow BLORP calls to be predicatedNanley Chery2017-07-222-0/+6
| | | | | Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/cmd_buffer: Skip some input attachment transitionsNanley Chery2017-07-221-5/+26
| | | | | Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Stop resolving CCS implicitlyNanley Chery2017-07-223-169/+5
| | | | | | | | | With an earlier patch from this series, resolves are additionally performed on layout transitions. Remove the now unnecessary implicit resolves within render passes. Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Transition more color buffer layoutsNanley Chery2017-07-222-28/+169
| | | | | | | | | | | v2: Expound on comment for the pipe controls (Jason Ekstrand). v3: - Cast base_layer to uint64_t to avoid overflow. - Remove "seems" from the pipe control comment. - Fix clamp of layer_count (Jason Ekstrand). Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/cmd_buffer: Warn about not enabling CCS_ENanley Chery2017-07-221-5/+7
| | | | | | | | Use the performance warning infrastructure to provide helpful information when testing applications. Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/cmd_buffer: Move aux_usage assignment upNanley Chery2017-07-221-32/+30
| | | | | | | | For readability, bring the assignment of CCS closer to the assignment of NONE and MCS. Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/cmd_buffer: Always enable CCS_D in render passesNanley Chery2017-07-222-11/+20
| | | | | | | | | | | | The lifespan of the fast-clear data will surpass the render pass scope. We need CCS_D to be enabled in order to invalidate blocks previously marked as cleared and to sample cleared data correctly. v2: Avoid refactoring. v3: Allow CCS_D for subpass resolves. Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/cmd_buffer: Disable CCS on gen7 color attachments upfrontNanley Chery2017-07-221-11/+5
| | | | | | | | | The next patch enables the use of CCS_D even when the color attachment will not be fast-cleared. Catch the gen7 case early to simplify the changes required. Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/cmd_buffer: Ensure fast-clear values are currentNanley Chery2017-07-221-0/+114
| | | | | | | v2: Rewrite functions, change location of synchronization. Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/gpu_memcpy: Add a lighter-weight GPU memcpy functionNanley Chery2017-07-222-0/+45
| | | | | | | | | | | | | | | | We'll be performing a GPU memcpy in more places to copy small amounts of data. Add an alternate function that thrashes less state. v2: - Make a new function (Jason Ekstrand). - Move the #define into the function. v3: - Update the function name (Jason). - Update comments. v4: Use an indirect drawing register as TEMP_REG (Jason Ekstrand). Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/cmd_buffer: Restrict fast clears in the GENERAL layoutNanley Chery2017-07-223-0/+40
| | | | | | | | | v2: Remove ::first_subpass_layout assertion (Jason Ekstrand). v3: Allow some fast clears in the GENERAL layout. v4: Remove extra '||' and adjust line break (Jason Ekstrand). Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/cmd_buffer: Don't partially fast clear image layersNanley Chery2017-07-221-8/+23
| | | | | | | | v2: Don't pass in the command buffer (Jason Ekstrand). v3: Remove an incorrect assertion and an if condition for gen7. Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/cmd_buffer: Initialize the clear values bufferNanley Chery2017-07-221-1/+78
| | | | | | | | | | v2: Rewrite functions. v3 (Jason Ekstrand): - Don't set ResourceMinLOD. - Fix clamp of level_count. Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/image: Append CCS/MCS with a fast-clear state bufferNanley Chery2017-07-222-0/+90
| | | | | | | v2: Update comments, function signatures, and add assertions. Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/image: Disable CCS if the image doesn't support renderingNanley Chery2017-07-221-0/+15
| | | | | Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/isl: Add surface state clear value informationNanley Chery2017-07-222-0/+13
| | | | | | | | This will be used to load and store clear values from surface state objects. Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Transition MCS buffers from the undefined layoutNanley Chery2017-07-223-18/+35
| | | | | | | | v2: Define MCS buffers with any sample count (Jason) Cc: <[email protected]> Suggested-by: Jason Ekstrand <[email protected]> Signed-off-by: Nanley Chery <[email protected]>
* intel/isl: Tighten up restrictions for CCS on gen7Jason Ekstrand2017-07-221-7/+23
| | | | | | | | | | It may technically be possible to enable some sort of fast-clear support for at least the base slice of a 2D array texture on gen7. However, it's not documented to work, we've never tried to do it in GL, and we have no idea what the hardware does if you turn on CCS_D with arrayed rendering. Let's just play it safe and disallow it for now. If someone really cares that much about gen7 performance, they can come along and try to get it working later.
* anv/blorp: Assert isl_surf_init success in do_buffer_copyJason Ekstrand2017-07-221-13/+15
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* anv/blorp: Explicitly set row_pitch in do_buffer_copyJason Ekstrand2017-07-221-1/+1
| | | | | | | | We have a very specific row pitch that we want and we don't want ISL to be changing it on us so just be explicit about it. Fixes: a40f0430347c07bf2d5794642fe02f5dd248a473 Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Set lower_vote_trivial in vector_nir_options_gen6 too.Kenneth Graunke2017-07-211-0/+1
| | | | | | There's a second struct for Gen6+. Reviewed-by: Matt Turner <[email protected]>
* intel/isl/gen7: Don't allow multisampled surfaces with valign2Topi Pohjolainen2017-07-221-19/+23
| | | | | | | | | | | | | | | | | There is the same constraintg later on as assert in isl_gen7_choose_image_alignment_el() so catch it earlier in order to return error instead of crash. Needed to avoid crashes with piglits on IVB and HSW: arb_internalformat_query2.image_format_compatibility_type pname checks arb_internalformat_query2.all internalformat_<x>_type pname checks arb_internalformat_query2.max dimensions related pname checks arb_copy_image.arb_copy_image-formats --samples=2/4/6/8 arb_texture_float.multisample-fast-clear gl_arb_texture_float Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* intel/isl/gen7: Allow msaa with signed integer formatsTopi Pohjolainen2017-07-221-2/+3
| | | | | | | | | | | | | These formats are already allowed by the i965 GL driver, and the feature seems to work just fine. There are tests for multisampled rendering in piglit: tests/spec/ext_framebuffer_multisample which can be patched to try 16I/32I in addition to GL_RGBA8I. IvyBridge passed all tests with all sample numbers. Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* intel/isl/gen7: Allow msaa with 128-bit formatsTopi Pohjolainen2017-07-221-4/+7
| | | | | | | | | | | | | | These formats are already allowed by the i965 GL driver, and the feature seems to work just fine. There are tests for multisampled rendering in piglit: tests/spec/ext_framebuffer_multisample which can be patched to try GL_RGBA16F/32F/16I/16UI/32I/32UI in addition to GL_RGBA/8I. IvyBridge passed all tests with all sample numbers and even with 128-bit formats. Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* intel/isl: Allow 1D surfaces with compressed formatsTopi Pohjolainen2017-07-221-1/+1
| | | | | Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* intel/isl: Align non-tiled horizontally by cache lineTopi Pohjolainen2017-07-221-1/+15
| | | | | | | in order to support blit engine. Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* i965/fs: Match destination type to size for ballotMatt Turner2017-07-202-2/+6
| | | | No use in taking a 64-bit value when we know the high 32-bits are zero.
* nir: Reduce destination size of ballot intrinsic when possibleMatt Turner2017-07-201-0/+1
| | | | | | | | | Some hardware, like i965, doesn't support group sizes greater than 32. In that case, we can reduce the destination size of the ballot intrinsic, which will simplify our code generation. Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Implement ARB_shader_ballot operationsMatt Turner2017-07-203-0/+48
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Do not move MOVs writing the flag outside of control flowMatt Turner2017-07-201-2/+4
| | | | | | | | | | | | | | | | | | | The implementation of ballotARB() will start by zeroing the flags register. So, a doing something like if (gl_SubGroupInvocationARB % 2u == 0u) { ... = ballotARB(true); [...] } else { ... = ballotARB(true); [...] } (like fs-ballot-if-else.shader_test does) would generate identical MOVs to the same destination (the flag register!), and we definitely do not want to pull that out of the control flow. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Handle explicit flag sources in flags_read()Francisco Jerez2017-07-201-4/+5
| | | | | | | The implementations of the ARB_shader_ballot intrinsics will explicitly read the flag as a source register. Reviewed-by: Matt Turner <[email protected]>
* nir: Add system values from ARB_shader_ballotMatt Turner2017-07-202-3/+3
| | | | | | | | | | | | | We already had a channel_num system value, which I'm renaming to subgroup_invocation to match the rest of the new system values. Note that while ballotARB(true) will return zeros in the high 32-bits on systems where gl_SubGroupSizeARB <= 32, the gl_SubGroup??MaskARB variables do not consider whether channels are enabled. See issue (1) of ARB_shader_ballot. Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>