mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	intel/compiler: Add a flag for pull constant support	Jason Ekstrand	2017-10-12	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	The Vulkan driver does not support pull constants. It simply limits things such that we can always push everything. Previously, we were determining whether or not to push things based on whether or not the prog_data::pull_param array is non-null. This is rather hackish and about to stop working. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Store image_param in brw_context instead of prog_data	Jason Ekstrand	2017-10-12	12	-41/+15
\| \| \| \| \| \| \| \| \| \|	This burns an extra 10k of memory or so in the case where you don't have any images. However, if you have several shaders which use images, this should be much less memory. It also gets rid of a part of prog_data that really has nothing to do with the compiler. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Use prog->info.num_images for needs_dc computation	Jason Ekstrand	2017-10-12	1	-2/+3
\| \| \| \| \| \| \| \|	This should be just as good as looking in prog_data but removes our one state setup dependency on brw_stage_prog_data::nr_image_param. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	intel: Rewrite the world of push/pull params	Jason Ekstrand	2017-10-12	10	-82/+155
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This moves us away to the array of pointers model and onto a model where each param is represented by a generic uint32_t handle. We reserve 2^16 of these handles for builtins that get generated by somewhere inside the compiler and have well-defined meanings. Generic params have handles whose meanings are defined by the driver. The primary downside to this new approach is that it moves a little bit of the work that we would normally do at compile time to draw time. On my laptop this hurts OglBatch6 by no more than 1% and doesn't seem to have any measurable affect on OglBatch7. So, while this may come back to bite us, it doesn't look too bad. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Get rid of gen7_cs_state.c	Jason Ekstrand	2017-10-12	6	-177/+145
\| \| \| \| \| \| \| \| \|	The only thing it was handling was push constants. We pull the actual constant upload code into gen6_constant_state.c and the atoms into genX_state_upload.c. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Add a helper for populating constant buffers	Jason Ekstrand	2017-10-12	3	-12/+33
\| \| \| \| \|	Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Move brw_upload_pull_constants to gen6_constant_state.c	Jason Ekstrand	2017-10-12	3	-64/+65
\| \| \| \| \|	Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Share the flush for brw_blorp_miptree_download into a pbo	Chris Wilson	2017-10-12	3	-31/+24
\| \| \| \| \| \| \| \| \| \| \| \| \|	As all users of brw_blorp_miptree_download() must emit a full pipeline and cache flush when targetting a user PBO (as that PBO may then be subsequently bound or be bound anywhere and outside of the driver dirty tracking) move that flush into brw_blorp_miptree_download() itself. v2 (Ken): Rebase without userptr stuff so it can land sooner. Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	meta: Delete the PBO texture upload/download path	Jason Ekstrand	2017-10-12	1	-63/+0
\| \| \| \| \|	Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Use blorp instead of meta for PBO pixel reads	Jason Ekstrand	2017-10-12	1	-9/+51
\| \| \| \| \|	Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Use blorp instead of meta for PBO texture downloads	Jason Ekstrand	2017-10-12	1	-4/+29
\| \| \| \| \|	Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/tex: Use blorp texture upload for all CCS_E textures	Jason Ekstrand	2017-10-12	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \|	This improves the FillTex benchmark in GLBench 2.7 by 30% on my Broxton. On Ken's Broxton which only has single-channel ram, it improves by 210%. v2 (Ken): Check mt->aux_usage == ISL_AUX_USAGE_CCS_E rather than using intel_miptree_is_lossless_compressed(). Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Use blorp instead of meta for PBO texture uploads	Jason Ekstrand	2017-10-12	1	-4/+30
\| \| \| \| \|	Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Add blorp-based texture upload and download paths	Jason Ekstrand	2017-10-12	2	-0/+362
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	v1 (Topi Pohjolainen): original patch. v2 (Topi Pohjolainen): - Fix return value (s/MESA_FORMAT_NONE/false/) (Anuj) - Move _mesa_tex_format_from_format_and_type() just in the end avoiding additional if-block (Anuj) - Explain better the array alignment restriction (Anuj) - Do not bail out in case of gl_pixelstore_attrib::ImageHeight, it is handled by _mesa_image_offset() automatically (Ken). - Support 1D_ARRAY by flipping depth, width and y, z (Ken). v3 (Topi Pohjolainen): - Contrary to v2, do not try to handle gl_pixelstore_attrib::ImageHeight. Currently there are no tests in piglit or cts for it. One could possibly copy or modify tests/texturing/texsubimage.c. There, however, seems to be number of corner cases to consider. Moreover, current meta path applies the packing height for both source and targets when determining the offset. This would probably require re-visiting also. v4 (Topi Pohjolainen): Rebased on top of merged drm-bacon v5 (Jason Ekstrand): - Move to brw_blorp.c - Significant refactoring - Fixed 1-D array textures - Simplified handling of PBOs vs. CPU data. - Handle gl_pixelstore_attrib::ImageHeight. It turns out there are piglit tests that cover this. The original version was failing them because of an error in the way it handled 1-D array textures. - Add support for texture download v6 (Kenneth Graunke): Rebase fixes: - Use intel_miptree_check_level_layer instead of deleted fields - Update for mesa_format_supports_render[] rename. - Pass 'false' (read-only) to intel_bufferobj_buffer v7 (Kenneth Graunke): - Fix brw_blorp_download_miptree to pass 'false' (not read only) for the destination buffer (caught by Chris Wilson). - Fix blorp_get_client_bo to pass intel_bufferobj_buffer !read_only for the 'writable' parameter instead of 'false' (caught by Jason). - Support GL_BGR, GL_BGRA, GL_BGRA_INTEGER, GL_BGR_INTEGER, allowing us to use this for ReadPixels on the window system buffer (caught by Chris Wilson). - Fix y-flipping bugs in download path (exposed by BGRA support). - Fix false vs. NULL return value in blorp_get_client_bo. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
*	i965: Refactor y-flipping coordinate transform.	Kenneth Graunke	2017-10-12	1	-7/+11
\| \| \| \| \| \|	I want to reuse it for the BLORP download path. Reviewed-by: Topi Pohjolainen <[email protected]>
*	i965/tex: Check if there is data to upload up-front	Jason Ekstrand	2017-10-12	1	-0/+4
\| \| \| \| \|	Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/barrier: Do the correct flushes for framebuffer access	Jason Ekstrand	2017-10-12	1	-1/+1
\| \| \| \| \| \| \| \| \|	Framebuffer access includes framebuffer reads so we need to invalidate the texture cache. We do not, however, need to flush the depth cache because you cannot do bind a depth texture as an image. Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/barrier: Do the correct flushes for texture updates	Jason Ekstrand	2017-10-12	1	-2/+4
\| \| \| \| \| \| \| \| \|	Texture uploads and downloads may go through the render pipe which may result in texturing from or rendering to the texture or the PBO. We need to flush accordingly. Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Ignore GL_SKIP_DECODE_EXT for textures accessed via texelFetch().	Kenneth Graunke	2017-10-12	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The GL_EXT_texture_sRGB_decode spec says: "The conversion of sRGB color space components to linear color space is always performed if the texel lookup function is one of the texelFetch builtin functions. Otherwise, if the texel lookup function is one of the texture builtin functions or one of the texture gather functions, the conversion of sRGB color space components to linear color space is controlled by the TEXTURE_SRGB_DECODE_EXT parameter. If the TEXTURE_SRGB_DECODE_EXT parameter is DECODE_EXT, the conversion of sRGB color space components to linear color space is performed. If the TEXTURE_SRGB_DECODE_EXT parameter is SKIP_DECODE_EXT, the value is returned without decoding. However, if the texture is also accessed with a texelFetch function, then the result of texture builtin functions and/or texture gather functions may be returned with decoding or without decoding." This patch makes i965 force sRGB decoding for any textures accessed via texelFetch(). If textures are accessed via texelFetch() and a regular texture access function, this will affect the other ones too - which is fine - it's undefined according to the last paragraph quoted. We could make both work, but we'd have to emit multiple SURFACE_STATEs, and have two binding table sections, like we do for texture gather hacks on older platforms. Fixes the following Android O CTS test: dEQP-GLES31.functional.srgb_texture_decode.skip_decode.srgba8.texel_fetch Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: Drop brw_bo_alloc in ARB_indirect_parameters implementation.	Kenneth Graunke	2017-10-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	The original implementation allocated a new BO here, but we decided to switch to intel_upload_space, which returns a reference to the current upload BO. We accidentally kept the brw_bo_alloc, even though it's no longer necessary - intel_upload_space will immediately unreference it, causing us to allocate and immediately free a buffer. Reviewed-by: Plamena Manolova <[email protected]>
*	i965: Allow mapped VBOs during drawing in non-debug contexts.	Kenneth Graunke	2017-10-11	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Section 6.3.2 of the GL 4.5 spec says: "Any GL command which attempts to read from, write to, or change the state of a buffer object may generate an INVALID_OPERATION error if all or part of the buffer object is mapped ... However, only commands which explicitly describe this error are required to do so. If an error is not generated, such commands will have undefined results and may result in GL interruption or termination." Setting this flag allows us to skip walking over the buffer bindings for every enabled vertex attribute (_mesa_all_buffers_are_unmapped). Improves performance in GFXBench4's gl_driver2_off microbenchmark by 3.05797% +/- 0.709031% (n=33) on Apollolake. This breaks KHR-*.draw_elements_base_vertex_tests.invalid_mapped_bos, but that test is invalid and has been removed from the upstream CTS. Reviewed-by: Eric Anholt <[email protected]>
*	i965: Make brw_update_texture_surface static.	Kenneth Graunke	2017-10-11	2	-5/+1
\| \| \| \|	Trivial. It's not used in other files.
*	mesa: rename various buffer bindings to one struct.	Dave Airlie	2017-10-11	2	-4/+4
\| \| \| \| \| \| \| \|	One binding to bind them all, these are all the same thing. Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
*	i965: Disable auxiliary buffers when there are self-dependencies.	Kenneth Graunke	2017-10-10	3	-25/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Jason and I investigated several OpenGL CTS failures where the tests bind the same texture for rendering and texturing, at the same time. This has defined results as long as the reads happen before writes, or the regions are non-overlapping. Normally, this just works out. However, CCS can cause problems. If the shader is reading one set of pixels, and writing to different pixels that are adjacent, they may end up being covered by the same CCS block. So rendering may be writing a CCS block, while the sampler is trying to read it. Corruption ensues. Disabling CCS is unfortunate, but safe. Fixes several KHR-GL45.texture_barrier.* subtests. Reviewed-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: minor whitespace fix	Kenneth Graunke	2017-10-10	1	-1/+1
\|
*	mesa: Only expose GLES's EXT_texture_type_2_10_10_10_REV if supported in HW.	Eric Anholt	2017-10-10	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Previously, we were downconverting to 8888 automatically if the hardware didn't suport it. However, with the advent of GL_OES_required_internalformat, we have to actually store the internalformats we advertise support for. And, it seems rather disingenuous to advertise the extension if we don't actually support it. v2: Throw an error when using the format on ES2 without the extension present. Reviewed-by: Nicolai Hähnle <[email protected]>
*	i965: silence coverity warning	Lionel Landwerlin	2017-10-10	1	-1/+1
\| \| \| \| \| \| \| \| \|	Also makes this statement a bit clearer. CID: 1418920 Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Antia Puentes <[email protected]>
*	meson: build classic swrast	Dylan Baker	2017-10-09	2	-0/+33
\| \| \| \| \| \| \| \|	This adds support for building the classic swrast implementation. This driver has been tested with glxinfo and glxgears. Signed-off-by: Dylan Baker <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	meson: Add support for configuring dri drivers directory.	Dylan Baker	2017-10-09	1	-1/+1
\| \| \| \| \| \| \| \|	v2: - drop with_ from dri_drivers_path variable (Eric A) v3: - Move HAVE_X11_PLATFORM to the proper patch (Eric A) Signed-off-by: Dylan Baker <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	meson: Build i965 and dri stack	Dylan Baker	2017-10-09	3	-0/+272
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This gets pretty much the entire classic tree building, as well as i965, including the various glapis. There are some workarounds for bugs that are fixed in meson 0.43.0, which is due out on October 8th. I have tested this with piglit using glx. v2: - fix typo "vaule" -> "value" - use gtest dep instead of linking to libgtest (rebase error) - use gtest dep instead of linking against libgtest (rebase error) - copy the megadriver, then create hard links from that, then delete the megadriver. This matches the behavior of the autotools build. (Eric A) - Use host_machine instead of target_machine (Eric A) - Put a comment in the right place (Eric A) - Don't have two variables for the same information (Eric A) - Put pre_args at top of file in this patch (Eric A) - Fix glx generators in this patch instead of next (Eric A) - Remove -DMESON hack (Eric A) - add sha1_h to mesa in this patch (Eric A) - Put generators in loops when possible to reduce code in mapi/glapi/gen (Eric A) v3: - put HAVE_X11_PLATFORM in this patch Signed-off-by: Dylan Baker <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965: pass wanted format to intel_miptree_create_for_dri_image	Tapani Pälli	2017-10-06	5	-40/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change b3a44ae7a4 caused regressions on Android where DRI and renderbuffer can disagree on the format being used. This patch removes the colorspace parameter and instead we pass renderbuffer format. For non-winsys images we still do srgb/linear modification in same manner as change b3a44ae7a4 wanted but take format from renderbuffer instead of DRI image. This patch fixes regressions seen with following test sets: dEQP-EGL.functional.color_clears* dEQP-EGL.functional.render* Signed-off-by: Tapani Pälli <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102999 Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: Add Atom graphics names to parse_devid_override()	Matt Turner	2017-10-04	1	-0/+3
\|
*	mesa: Remove force_s3tc_enable driconf variable	Matt Turner	2017-10-02	3	-5/+0
\| \| \| \| \|	Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
*	mesa: Drop Mesa_DXTn from gl_context	Matt Turner	2017-10-02	6	-31/+10
\| \| \| \| \|	Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
*	i965: Implement ARB_indirect_parameters.	Plamena Manolova	2017-10-02	4	-1/+124
\| \| \| \| \| \| \| \| \| \| \|	We can implement ARB_indirect_parameters for i965 by taking advantage of the conditional rendering mechanism. This works by issuing maxdrawcount draw calls and using conditional rendering to predicate each of them with "drawcount > gl_DrawID" Signed-off-by: Plamena Manolova <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Refactor brw_try_draw_prims.	Plamena Manolova	2017-10-02	1	-117/+119
\| \| \| \| \| \| \| \| \| \| \|	In order to add our ARB_indirect_parameters implementation we need to refactor brw_try_draw_prims so that it operates on a per primitive basis and move the loop into brw_draw_prims. This commit refactors the brw_try_draw_prims function and renames it to brw_draw_single_prim. Signed-off-by: Plamena Manolova <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Indroduce brw_finish_drawing.	Plamena Manolova	2017-10-02	1	-7/+14
\| \| \| \| \| \| \| \| \| \| \|	In order to add our ARB_indirect_parameters implementation we need to refactor brw_try_draw_prims so that it operates on a per primitive basis and move the loop into brw_draw_prims. This commit introduces the brw_finish_drawing function where we move the code that executes once after the loop. Signed-off-by: Plamena Manolova <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Introduce brw_prepare_drawing.	Plamena Manolova	2017-10-02	1	-19/+27
\| \| \| \| \| \| \| \| \| \| \|	In order to add our ARB_indirect_parameters implementation we need to refactor brw_try_draw_prims so that it operates on a per primitive basis and move the loop into brw_draw_prims. This commit introduces the brw_prepare_drawing function where we move the code that executes once before the loop. Signed-off-by: Plamena Manolova <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: skip reading unused slots at the begining of the URB for the FS	Iago Toral Quiroga	2017-10-02	1	-10/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We can start reading the URB at the first offset that contains varyings that are actually read in the URB. We still need to make sure that we read at least one varying to honor hardware requirements. This helps alleviate a problem introduced with 99df02ca26f61 for separate shader objects: without separate shader objects we assign locations sequentially, however, since that commit we have changed the method for SSO so that the VUE slot assigned depends on the number of builtin slots plus the location assigned to the varying. This fixed layout is intended to help SSO programs by avoiding on-the-fly recompiles when swapping out shaders, however, it also means that if a varying uses a large location number close to the maximum allowed by the SF/FS units (31), then the offset introduced by the number of builtin slots can push the location outside the range and trigger an assertion. This problem is affecting at least the following CTS tests for enhanced layouts: KHR-GL45.enhanced_layouts.varying_array_components KHR-GL45.enhanced_layouts.varying_array_locations KHR-GL45.enhanced_layouts.varying_components KHR-GL45.enhanced_layouts.varying_locations which use SSO and the the location layout qualifier to select such location numbers explicitly. This change helps these tests because for SSO we always have to include things such as VARYING_SLOT_CLIP_DIST{0,1} even if the fragment shader is very unlikely to read them, so by doing this we free builtin slots from the fixed VUE layout and we avoid the tests to crash in this scenario. Of course, this is not a proper fix, we'd still run into problems if someone tries to use an explicit max location and read gl_ViewportIndex, gl_LayerID or gl_CullDistancein in the FS, but that would be a much less common bug and we can probably wait to see if anyone actually runs into that situation in a real world scenario before making the decision that more aggresive changes are required to support this without reverting 99df02ca26f61. v2: - Add a debug message when we skip clip distances (Ilia) - we also need to account for this when we compute the urb setup for the fragment shader stage, so add a compiler util to compute the first slot that we need to read from the URB instead of replicating the logic in both places. v3: - Make the util more generic so it can account for all unused slots at the beginning of the URB, that will make it more useful (Ken). - Drop the debug message, it was not what Ilia was asking for. Suggested-by: Kenneth Graunke <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/link: Use prog->nir instead of creating a temporary	Jason Ekstrand	2017-09-28	1	-4/+3
\| \| \| \| \| \| \| \| \|	This way, when NIR_PASS_V makes a clone of the shader (for testing nir_clone), the new and lowered version gets re-assigned to prog->nir. [[email protected]: Tested NIR_TEST_CLONE=1 with valgrind] Tested-by: Jordan Justen <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
*	i965/link: Make more use of NIR_PASS	Jason Ekstrand	2017-09-28	1	-6/+6
\| \| \| \| \| \|	[[email protected]: Tested NIR_TEST_CLONE=1 with valgrind] Tested-by: Jordan Justen <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
*	i965/link: Make better use of temporary variables	Jason Ekstrand	2017-09-28	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \|	The way NIR_PASS works (and, by extension, nir_optimize) is that they may clone the shader and throw the old one away. (We use this for testing nir_clone.) It's better if we just make a temporary variable, use it for everything, and re-assign to the gl_program at the end. [[email protected]: Tested NIR_TEST_CLONE=1 with valgrind] Tested-by: Jordan Justen <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
*	i965: enable up to 32 inputs for geometry shaders in gen8+	Iago Toral Quiroga	2017-09-28	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have been exposing only 16 since 1e3e72e3054de with arguments based on register pressure and the number of available GRFs, however, our scalar backend will always limit the number of push registers for GS threads to 24 and fallback to pull model for anything else, so there is really no reason to lower the number under those arguments. By bumping this up to 32 we make it the same as all the other stages, which is a nice feature to have that can help applications in some cases (I recently fixed a bug in CTS that assumed that the number of input locations in a stage matches the number of output locations in the previous stage for example). Pre-gen8, we use the vector backend and push model, so in that case the arguments in 1e3e72e3054de are still valid. v2: check if we have scalar GS instead of the hw gen to enable this (Ken). Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Convert brw->*_program into a brw->programs[i] array.	Kenneth Graunke	2017-09-26	22	-126/+147
\| \| \| \| \| \|	This makes it easier to loop over programs. Reviewed-by: Alejandro Piñeiro <[email protected]>
*	i965: make use of nir linking	Timothy Arceri	2017-09-26	1	-0/+56
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For now linking is just removing unused varyings between stages. shader-db results BDW: total instructions in shared programs: 13198288 -> 13191693 (-0.05%) instructions in affected programs: 48325 -> 41730 (-13.65%) helped: 473 HURT: 0 total cycles in shared programs: 541184926 -> 541159260 (-0.00%) cycles in affected programs: 213238 -> 187572 (-12.04%) helped: 435 HURT: 8 V2: - lower indirects on demoted inputs as well as outputs. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: call brw_shader_gather_info() from the callers of brw_create_nir()	Timothy Arceri	2017-09-26	2	-7/+18
\| \| \| \| \| \| \|	This will allow us to insert a nir linking step in brw_link_shader(). Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]>
*	i965: create a brw_shader_gather_info() helper	Timothy Arceri	2017-09-26	2	-7/+16
\| \| \| \| \| \| \| \|	This will help us call gather info at a later point and allow us to do some linking in nir. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]>
*	i965: Rename do_flush_locked to submit_batch().	Kenneth Graunke	2017-09-25	1	-3/+4
\| \| \| \| \| \| \|	do_flush_locked isn't a great name - especially given that there's no locking going on in our code relating to execbuf. Reviewed-by: Chris Wilson <[email protected]>
*	i965: Use atomic ops in get_new_program_id().	Kenneth Graunke	2017-09-25	2	-6/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	We have a nice utility function for this, which eliminates the need for locking stuff. This isn't really performance critical, but it's less code to use the atomic. p_atomic_inc_return does pre-increment rather than post-increment, so we change screen->program_id to be initialized to 0 instead of 1. At which point, we can just delete the initialization because intel_screen is rzalloc'd. Reviewed-by: Chris Wilson <[email protected]>
*	i965: Convert brw_bufmgr to use C11 mutexes instead of pthreads.	Kenneth Graunke	2017-09-25	1	-18/+17
\| \| \| \| \| \| \|	There's no real advantage or disadvantage here, it's just for stylistic consistency with the rest of the codebase. Reviewed-by: Chris Wilson <[email protected]>