summaryrefslogtreecommitdiffstats
path: root/src/mesa
Commit message (Collapse)AuthorAgeFilesLines
* i965/vec4: support packing tcs inputsTimothy Arceri2016-07-212-2/+7
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* i965/vec4: add component packing for gsTimothy Arceri2016-07-211-0/+2
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* i965/vec4: add support for packing vs/gs/tes outputsTimothy Arceri2016-07-213-4/+45
| | | | | | | | | | | Here we create a new output_generic_reg array with the ability to store the dst_reg for each component of user defined varyings. This is needed as the previous code only stored the dst_reg based on the varying location which meant packed varyings would overwrite each other. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Alejandro Piñeiro <[email protected]>
* i965/vec4: add support for packing inputsTimothy Arceri2016-07-211-0/+2
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* i965: add helper for creating packing writemaskTimothy Arceri2016-07-211-0/+7
| | | | | | | | | | | For example where n=3 first_component=1 this will give us 0xE (WRITEMASK_YZW). V2: Add assert to check first component is <= 4 (Suggested by Ken) Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* i965: add helpers for creating component layout swizzleTimothy Arceri2016-07-211-0/+3
| | | | | | | | | This will be used to swizzle components to the beginning or end of the vector based on the component layout qualifier and whether we are doing a load or store. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* i965: enable ARB_enhanced_layouts for gen8+Timothy Arceri2016-07-211-0/+1
| | | | Acked-by: Edward O'Callaghan <[email protected]>
* i965: add component packing support for load_output intrinsicsTimothy Arceri2016-07-211-5/+33
| | | | | | | | Here we use the component qualifier (which is the first component) as an offset when loading output varyings. Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: enable component packing for vs and fsTimothy Arceri2016-07-214-25/+16
| | | | | | | | | | | Rather than trying to work out the total number of components used at a location we simply treat all outputs as vec4s. This removes the need for complex code looping over varyings to match packed locations and the need for storing the total number of components used at each location. Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: bring back type_size_vec4_times_4()Timothy Arceri2016-07-212-0/+14
| | | | | | | | We will use this for output varyings. To make component packing simpler we will just treat all varyings as vec4s. Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Move VS load_input handling to nir_emit_vs_intrinsic().Kenneth Graunke2016-07-201-31/+30
| | | | | | | | | | TCS/TES/GS and now FS all handle these in stage-specific functions. CS don't have inputs, so VS was the only one left using this code. Move it to the VS-specific function for clarity. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Delete the FS_OPCODE_INTERPOLATE_AT_CENTROID virtual opcode.Kenneth Graunke2016-07-204-10/+0
| | | | | | | | | We no longer use this message. As far as I can tell, it's fairly useless - the equivalent information is provided in the payload. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Rewrite FS input handling to use the new NIR intrinsics.Kenneth Graunke2016-07-205-341/+270
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This eliminates the need to walk the list of input variables, recurse into their types (via logic largely redundant with nir_lower_io), and interpolate all possible inputs up front. The backend no longer has to care about variables at all, which eliminates complications from trying to pack multiple variables into the same location. Instead, each intrinsic specifies exactly what's needed. This should unblock Timothy's work on GL_ARB_enhanced_layouts. Each load_interpolated_input intrinsic corresponds to PLN instructions, while load_barycentric_at_* intrinsics correspond to pixel interpolator messages. The pixel/centroid/sample barycentric intrinsics simply refer to payload fields (delta_xy[]), and don't actually generate any code. Because we use a single intrinsic for both centroid-qualified variables and interpolateAtCentroid(), they become indistinguishable. We stop sending pixel interpolator messages for those, and instead use the payload provided data, which should be considerably faster. On Broadwell: total instructions in shared programs: 9067751 -> 9067570 (-0.00%) instructions in affected programs: 145902 -> 145721 (-0.12%) helped: 422 HURT: 209 total spills in shared programs: 2849 -> 2899 (1.76%) spills in affected programs: 760 -> 810 (6.58%) helped: 0 HURT: 10 total fills in shared programs: 3910 -> 3950 (1.02%) fills in affected programs: 617 -> 657 (6.48%) helped: 0 HURT: 10 LOST: 3 GAINED: 3 The differences mostly appear to be slight changes in MOVs. v2: Use nir_shader_compiler_options::use_interpolated_input_intrinsics flag rather than passing it directly to nir_lower_io. Use the unreachable() macro rather than assert in one place. (Review feedback from Chris Forbes.) Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Acked-by: Jason Ekstrand <[email protected]>
* i965: Move load_interpolated_input/barycentric_* intrinsics to the top.Kenneth Graunke2016-07-201-0/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, i965 interpolates all FS inputs at the top of the program. This has advantages and disadvantages, but I'd like to keep that policy while reworking this code. We can consider changing it independently. The next patch will make the compiler generate PLN instructions "on the fly", when it encounters an input load intrinsic, rather than doing it for all inputs at the start of the program. To emulate this behavior, we introduce an ugly pass to move all NIR load_interpolated_input and payload-based (not interpolator message) load_barycentric_* intrinsics to the shader's start block. This helps avoid regressions in shader-db for cases such as: if (...) { ...load some input... } else { ...load that same input... } which CSE can't handle, because there's no dominance relationship between the two loads. Because the start block dominates all others, we can CSE all inputs and emit PLNs exactly once, as we did before. Ideally, global value numbering would eliminate these redundant loads, while not forcing them all the way to the start block. When that lands, we should consider dropping this hacky pass. Again, this pass currently does nothing, as i965 doesn't generate these intrinsics yet. But it will shortly, and I figured I'd separate this code as it's relatively self-contained. v2: Dramatically simplify pass - instead of creating new instructions, just remove/re-insert their list nodes (suggested by Jason Ekstrand). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]> [v1] Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Add a pass to demote sample interpolation intrinsics.Kenneth Graunke2016-07-201-0/+44
| | | | | | | | | | | | | | | | | | | When working with a non-multisampled render target, asking for "sample" interpolation locations doesn't make sense. We demote them to centroid. In a couple of patches, brw_compute_barycentric_modes will begin looking at these intrinsics to determine the barycentric modes. fs_visitor also will use them to code-generate pixel interpolator messages or payload references. Handling the "but what if it's not MSAA?" logic ahead of time in a NIR pass simplifies things and prevents duplicated logic. This patch doesn't actually do anything useful yet as we don't generate these intrinsics. I decided to keep it separate as it's self-contained, in the hopes of shrinking the "convert everything" patch for reviewers. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Stop muging cube array lengths by 6Jason Ekstrand2016-07-205-38/+11
| | | | | | | | | | | | | | | | | | | | | | | | From the Sky Lake PRM: "For SURFTYPE_CUBE: For Sampling Engine Surfaces and Typed Data Port Surfaces, the range of this field is [0,340], indicating the number of cube array elements (equal to the number of underlying 2D array elements divided by 6). For other surfaces, this field must be zero." In other words, the depth field for cube maps is in number of cubes not number of 2-D slices so we need to divide by 6. ISL will do this correctly for us assuming that we provide it with the correct array bounds which it expects to be in 2-D slices. It appears as if we've been doing this wrong ever since we first added cube map arrays for Sandy Bridge and the change to ISL made things slightly worse. While we're at it, we now need to remoe the shader hacks we've always done since they were only needed because we were setting the depth field six times too large. v2: Fix the vec4 backend as well (not sure how I missed this). Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/miptree: Set logical_depth0 == 6 for cube mapsJason Ekstrand2016-07-201-4/+11
| | | | | | | | | | | | | This matches what we do for cube maps where logical_depth0 is in number of face-layers rather than number of cubes. This does mean that we will temporarily be setting the surface bounds too loose for cube map textures but we are already setting them too loose for cube arrays and we will be fixing that in the next commit anyway. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Cc: "12.0 11.2 11.1" <[email protected]>
* i965/miptree: Enforce that height == 1 for 1-D array texturesJason Ekstrand2016-07-202-19/+5
| | | | | | | | | | | | | | | | | | | The GL API and mesa internals do this differently than we do. In GL, there is no depth parameter for 1-D arrays and height is used. In the i965 miptree code we do the sane thing and make height == 1 and use depth for number of slices. This makes for a mismatch every time we create a 1-D array texture from GL. Instead of actually solving this problem, we just said "1-D is hard, let's make sure it works no matter which way we pass the parameters" and called it a day. This commit fixes the one GL -> i965 transition point where we weren't already handling 1-D array textures to do the right thing and then replaces the magic fixup code with an assert that you're doing the right thing. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Cc: "12.0 11.2 11.1" <[email protected]>
* i965: store reference to the context within struct brw_fence (v2)Emil Velikov2016-07-201-11/+44
| | | | | | | | | | | | | | As the spec allows for {server,client}_wait_sync to be called without currently bound context, while our implementation requires context pointer. v2: Add a mutex and acquire it for the duration of brw_fence_client_wait() and brw_fence_is_completed() as suggested by Chad. Cc: "11.2 12.0" <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Signed-off-by: Tomasz Figa <[email protected]>
* mesa: scons: remove left over src/glsl includeEmil Velikov2016-07-201-1/+0
| | | | | | The path no longer exists. Signed-off-by: Emil Velikov <[email protected]>
* mesa: scons: list builddir before srcdirEmil Velikov2016-07-201-4/+4
| | | | | | | | | | | Analogous to previous commit. Note: scons always uses OOT builds, while the in-tree generated files could be created either manually or by the autoconf build. Cc: "11.2 12.0" <[email protected]> Cc: Alexander von Gluck IV <[email protected]> Signed-off-by: Emil Velikov <[email protected]>
* mesa: automake: list builddir before srcdirEmil Velikov2016-07-201-3/+3
| | | | | | | | | | In the case of building in out-of-tree fashion, while having generated in-tree sources, the latter [likely stale] files will be used. Flip the order to prevent that. Cc: "11.2 12.0" <[email protected]> Signed-off-by: Emil Velikov <[email protected]>
* gallium: add a cap for VIEWPORT_SUBPIXEL_BITS (v2)Józef Kucia2016-07-201-0/+3
| | | | | | | | | | | | This allows Gallium drivers to advertise the subpixel precision for floating point viewports bounds. v2: - Set ViewportSubpixelBits in st_init_limits. Signed-off-by: Józef Kucia <[email protected]> Signed-off-by: Marek Olšák <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* i965: Use tex_mocs instead of rb_mocs for GL images.Kenneth Graunke2016-07-191-1/+1
| | | | | | | | | | | | | | Fixes a 10-20% performance regression in OglCSDof caused by commit 5a8c89038abab0184ea72664ab390ec6ca58b4d6, which made images (in the image load/store sense) use BDW_MOCS_PTE instead of BDW_MOCS_WB. This seems sketchy, as the default PTE value is supposed to be WB LLC eLLC, which is the same as our MOCS WB setting. It's only supposed to change when using a surface for display, which won't ever happen for images. Something may be wrong in the kernel... Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* st/mesa: Enable MESA_shader_integer_functions on all GLSL 1.30 platformsIan Romanick2016-07-192-1/+16
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Enable MESA_shader_integer_functions on all GLSL 1.30 platformsIan Romanick2016-07-192-6/+15
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Don't lower uaddCarry and usubBorrow in both GLSL IR and NIRIan Romanick2016-07-191-3/+1
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Update assertion to account for Gen < 7Ian Romanick2016-07-191-1/+4
| | | | | | | | | | | | Previously SHADER_OPCODE_MULH could only exist on Gen7+, so the assertion assumed the Gen7+ accumulator rules. A future patch will allow this instruction on at least Gen6, so update the assertion. v2: Use get_lowered_simd_width instead of open coding it. Suggested by Curro. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]> [v1]
* i965: Use LZD to implement nir_op_find_lsb on Gen < 7Ian Romanick2016-07-192-3/+45
| | | | | | | v2: Rebase on changes to previous two patches. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Use LZD to implement nir_op_ifind_msb on Gen < 7Ian Romanick2016-07-192-21/+90
| | | | | | | | | v2: Retype LZD source as UD to avoid potential problems with 0x80000000. Suggested by Matt. Also update comment about problem values with LZD(abs(x)). Suggested by Curro. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Use LZD to implement nir_op_ufind_msbIan Romanick2016-07-194-1/+54
| | | | | | | | | | This uses one less instruction. v2: Move emit_find_msb_using_lzd out of the visitor classes. Suggested by Curro. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Always enable GL_ARB_shading_language_packingIan Romanick2016-07-191-1/+1
| | | | | | | | | With the existing lowering passes, the functions from this extension become a bunch of bit twiddling operations that have always been supported. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Move enable of EXT_shader_integer_mixIan Romanick2016-07-191-1/+2
| | | | | | | | This extension does not depend on the Gen. It only depends on the availability of GLSL 1.30. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* MESA_shader_integer_functions: Boiler plate extension trackingIan Romanick2016-07-192-0/+2
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Skip update_texture_surface when the plane doesn't existJason Ekstrand2016-07-181-8/+10
| | | | | | | | | Thanks to rebase fail, recent surface state changes (commits 7e951cd56, 8521ce1a7, and 69c0dc5c53) effectively reverted 727a9b24933 and 367cf3a2e3e which was unintentional. This should bring it back. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/gen9: Enable KHR_texture_compression_astc_sliced_3dAnuj Phogat2016-07-181-0/+1
| | | | | Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* mesa: Add the infrastructure for KHR_texture_compression_astc_sliced_3dAnuj Phogat2016-07-183-3/+6
| | | | | | | V2: Drop the changes to gl.xml. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* i965/tes/scalar: fix 64-bit indirect input loadsIago Toral Quiroga2016-07-181-22/+64
| | | | | | | We totally ignored this before because there were no piglit tests for indirect loads in tessellation stages with doubles. Reviewed-by: Timothy Arceri <[email protected]>
* i965/tcs/scalar: only update imm_offset for second message in 64bit input loadsIago Toral Quiroga2016-07-181-7/+1
| | | | | | | | Our indirect URB read messages take both a direct and an indirect offset so when we emit the second message for a 64-bit input load we can just always incremement the immediate offset, even for the indirect case. Reviewed-by: Timothy Arceri <[email protected]>
* i965: Move pulls_bary setting to emit_pixel_interpolator_send().Kenneth Graunke2016-07-171-4/+4
| | | | | | | | | pulls_bary should be set when the shader uses a pixel interpolator message. So, setting it from the function that emits pixel interpolator messages makes a lot of sense. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* i965: Write gl_FragCoord directly to the destination.Kenneth Graunke2016-07-173-10/+4
| | | | | | | | | | | This patch makes emit_general_interpolation take a destination register as an argument, and write directly to that. This is simpler than the old approach of ralloc'ing a register, writing to that temporary, and then making the caller emit per-component MOVs to copy it to the actual destination. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* i965: Drop has_pln checks in unlit centroid workaround.Kenneth Graunke2016-07-171-5/+2
| | | | | | | | | The unlit centroid workaround starts being necessary on Gen6, which is the first platform with multisampling. PLN exists on G45+, so all platforms which need this workaround have PLN. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* i965: Drop VARYING_SLOT_FACE special case in barycentric setup.Kenneth Graunke2016-07-171-3/+2
| | | | | | | | | glsl_to_nir always produces a system value for gl_FrontFacing, rather than an input. So there should never be an input with this slot, making this code dead. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* compiler: Rename INTERP_QUALIFIER_* to INTERP_MODE_*.Kenneth Graunke2016-07-1712-54/+54
| | | | | | | | | | | | | | | | | Likewise, rename the enum type to glsl_interp_mode. Beyond the GLSL front-end, talking about "interpolation modes" seems more natural than "interpolation qualifiers" - in the IR, we're removed from how exactly the source language specifies how to interpolate an input. Also, SPIR-V calls these "decorations" rather than "qualifiers". Generated by: $ find . -regextype egrep -regex '.*\.(c|cpp|h)' -type f -exec sed -i \ -e 's/INTERP_QUALIFIER_/INTERP_MODE_/g' \ -e 's/glsl_interp_qualifier/glsl_interp_mode/g' {} \; Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Dave Airlie <[email protected]>
* mesa/st: reduce size of state->st bitmaskRob Clark2016-07-161-1/+1
| | | | | | | | | | | | In d035d50 this changed to 64b.. which I'm pretty sure was unintentional. Revert it back to 32b so the entire state struct is a nice round 64b. (Note sure that it would actually be measurable, but I did notice that check_state() was hot in some benchmarks.) Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* mesa: handle numSamples=0 in _mesa_test_proxy_teximage()Brian Paul2016-07-151-3/+1
| | | | | | | Should fix the regressions reported in bug 96949. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96949 Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Remove the emit_linterp() helper.Kenneth Graunke2016-07-152-21/+8
| | | | | | | | | | | | | Rather than computing the barycentric mode each time we emit a LINTERP, we can simply compute it once, as soon as we know we're doing non-flat interpolation. At that point, emit_linterp() doesn't do much, so fold it into the call sites and drop it. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Reduce the number of fs_reg(brw_reg) calls in LINTERP handling.Kenneth Graunke2016-07-151-4/+4
| | | | | | | | A bit tidier. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Make a barycentric_mode() helper function.Kenneth Graunke2016-07-151-51/+49
| | | | | | | This combines two copies of basically the same code. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Rename brw_wm_barycentric_interp_mode to brw_barycentric_mode.Kenneth Graunke2016-07-156-38/+38
| | | | | | | | | | | | | | | | | | | | | | | brw_wm_barycentric_interp_mode is wordy, brw_barycentric_mode is less typing and suffers from fewer line wrapping problems. The enum values themselves don't really benefit from "WM" in the name, either. Put "BARYCENTRIC" first instead of at the end and drop "WM". Generated by: for file in *.c *.cpp *.h; do sed -i \ -e 's/brw_wm_barycentric_interp_mode/brw_barycentric_mode/g' \ -e 's/BRW_WM_\([A-Z_]*\)_BARYCENTRIC/BRW_BARYCENTRIC_\1/g' \ -e 's/BRW_WM_BARYCENTRIC_INTERP_MODE_COUNT/BRW_BARYCENTRIC_MODE_COUNT/g' \ $file; done with a few whitespace changes. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>