mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	i965: Remove early release of DRI2 miptree	Chris Wilson	2015-09-30	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	intel_update_winsys_renderbuffer_miptree() will release the existing miptree when wrapping a new DRI2 buffer, so we can remove the early release and so prevent a NULL mt dereference should importing the new DRI2 name fail for any reason. (Reusing the old DRI2 name will result in the rendering going astray, to a stale buffer, and not shown on the screen, but it allows us to issue a warning and not crash much later in innocent code.) Signed-off-by: Chris Wilson <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86281 Reviewed-by: Martin Peres <[email protected]> Reviewed-by: Chad Versace <[email protected]>
*	i965: Set MaxShaderStorageBuffers for compute shaders	Iago Toral Quiroga	2015-09-25	1	-0/+3
\| \| \| \| \| \| \| \|	v2: - Set it after the driver's MaxShaderStorageBuffers value assignment. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965: set ARB_shader_storage_buffer_object related constant values	Samuel Iglesias Gonsalvez	2015-09-25	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \|	v2: - Add tessellation shader constants assignment v3: - Set MaxShaderStorageBufferBindings to 36. Signed-off-by: Samuel Iglesias Gonsalvez <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965: Use 64-byte offset alignment for shader storage buffers	Iago Toral Quiroga	2015-09-25	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This should be a cacheline (64 bytes) so that we can safely have the CPU and GPU writing the same SSBO on non-cachecoherent systems (our Atom CPUs). With UBOs, the GPU never writes, so there's no problem. For an SSBO, the GPU and the CPU can be updating disjoint regions of the buffer simultaneously and that will break if the regions overlap the same cacheline. v2: - Use cacheline size (64 bytes) instead of 16 bytes (Kristian). - Update commit log and add a comment in the code explaining why we use cacheline size (Ben). Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965: Advertise 65536 for GL_MAX_UNIFORM_BLOCK_SIZE.	Kenneth Graunke	2015-09-10	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Our old value of 16384 is the minimum value. DirectX apparently requires 65536 at a minimum; that's also what nVidia and the Intel Windows driver advertise. AMD advertises MAX_INT. Ilia Mirkin noticed that "Shadow Warrior" uses UBOs larger than 16k on Nouveau, which advertises 65536 bytes for this limit. Traces captured on Nouveau don't work on i965 because our lower limit causes the GLSL linker to reject the captured shaders. While this isn't important in and of itself, it does suggest that raising the limit would be beneficial. We can read linear buffers up to 2^27 bytes in size, so raising this should be safe; we could probably even go larger. For now, matching nVidia and Intel/Windows seems like a good plan. We have to reinitialize MaxCombinedUniformComponents as core Mesa will have set it based on a stale value for MaxUniformBlockSize. According to Tapani, there's an unreleased game that asserts on this. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Tapani Pälli <[email protected]> Cc: "11.0" <[email protected]>
*	mesa: Rename MaxCombinedImageUnitsAndFragmentOutputs to ↵	Francisco Jerez	2015-08-20	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	MaxCombinedShaderOutputResources. The name of both the GLSL built-in variable and the glGetInteger param with the same value changed in GLSL ES 3.1 and GL 4.5. Its semantics also changed slightly, since the limit now also takes into account the number of SSBs in use. Switch our internal data structures to the up-to-date name. Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	i965: Define implementation constants for ARB_shader_image_load_store.	Francisco Jerez	2015-08-11	1	-0/+12
\| \| \| \| \| \| \| \|	Reviewed-by: Paul Berry <[email protected]> v2: Drop VS support pre-Gen8, drop GS support. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Enable hardware-generated binding tables on render path.	Abdiel Janulgue	2015-07-18	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch implements the binding table enable command which is also used to allocate a binding table pool where where hardware-generated binding table entries are flushed into. Each binding table offset in the binding table pool is unique per each shader stage that are enabled within a batch. Also insert the required brw_tracked_state objects to enable hw-generated binding tables in normal render path. v2: - Use MOCS in binding table pool alloc for GEN8 - Fix spurious offset when allocating binding table pool entry and start from zero instead. v3: - Include GEN8 fix for spurious offset above. v4: - Fixup wrong packet length in enable/disable hw-binding table for GEN8 (Ville). - Don't invoke HW-binding table disable command when we dont have resource streamer (Chris). v5: - Reorder the state cache invalidate flush so it happens in-between enabling hw-generated binding tables and the previous sw-binding table GPU state (Chris). v6: - Do the same fix in v5 for gen7_disable_hw_binding_tables(). - Adhere to coding guidelines and make comments more informative. Cc: [email protected] Cc: [email protected] Cc: [email protected] Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Abdiel Janulgue <[email protected]>
*	i965: Enable resource streamer for the batchbuffer	Abdiel Janulgue	2015-07-18	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Check first if the hardware and kernel supports resource streamer. If this is allowed, tell the kernel to enable the resource streamer enable bit on MI_BATCHBUFFER_START by specifying I915_EXEC_RESOURCE_STREAMER execbuffer flags. v2: - Use new I915_PARAM_HAS_RESOURCE_STREAMER ioctl to check if kernel supports RS (Ken). - Add brw_device_info::has_resource_streamer and toggle it for Haswell, Broadwell, Cherryview, Skylake, and Broxton (Ken). v3: - Update I915_PARAM_HAS_RESOURCE_STREAMER to match updated kernel. v4: - Always inspect the getparam.value (Chris Wilson). v5: - Fold redundant devinfo->has_resource_streamer check in context create into init screen. Cc: [email protected] Cc: [email protected] Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Abdiel Janulgue <[email protected]>
*	i965: Move pipecontrol workaround bo to brw_pipe_control	Chris Wilson	2015-07-08	1	-0/+7
\| \| \| \| \| \| \| \| \| \|	With the exception of gen8, the sole user of the workaround bo are for emitting pipe controls. Move it out of the purview of the batchbuffer and into the pipecontrol. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Martin Peres <[email protected]>
*	i965/bxt: Add basic Broxton infrastructure	Ben Widawsky	2015-06-24	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The thread counts and URB information are all speculative numbers that were based on some CHV numbers at the time. v2: Originally this patch had PCI IDs. I've moved that to a new patch at the end of the series. Remove is_cherryview hack. Add PCI ids. These match the ones defined in the kernel. The only one tested by us is 0x0a84. Capitalize the hex string (Mark) Signed-off-by: Ben Widawsky <[email protected]> Tested-by: "Lecluse, Philippe" <[email protected]> Reviewed-by: Mark Janes <[email protected]>
*	i965: Add compiler options to brw_compiler	Jason Ekstrand	2015-06-23	1	-43/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	This creates the options at screen cration time and then we just copy them into the context at context creation time. We also move is_scalar to the brw_compiler structure. We also end up manually setting some values that the core would have set by default for us. Fortunately, there are only two non-zero shader compiler option defaults that we aren't overriding anyway so this isn't a big deal. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
*	i965: Move INTEL_DEBUG variable parsing to screen creation time	Jason Ekstrand	2015-06-23	1	-1/+3
\| \| \| \| \| \|	v2: Do bufmgr set_debug and set_aub_dump at screen time as well. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: enable ARB_framebuffer_no_attachments for Gen7+	Kevin Rogovin	2015-06-17	1	-0/+6
\| \| \| \| \| \| \|	Enable GL_ARB_framebuffer_no_attachments in i965 for Gen7 and higher. Reviewed-by: Ian Romanick <[email protected]> Signed-off-by: Kevin Rogovin <[email protected]>
*	Revert "i965: Advertise a line width of 40.0 on Cherryview and Skylake."	Kenneth Graunke	2015-06-11	1	-5/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit f3b709c0ac073cd0ec90a3a0d91d1ee94668e043. The "dEQP-GLES3.functional.rasterization.fbo.rbo_multisample_4. interpolation.lines_wide" test appears to be broken on Cherryview when we expose line widths greater than 12.0. I'm not sure why. For now, just go back to the limits we used on older platforms. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90902 Acked-by: Matt Turner <[email protected]>
*	i965: do not round line width when multisampling or antialiaing are enabled	Iago Toral Quiroga	2015-06-11	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In commit fe74fee8fa721a we rounded the line width to the nearest integer to match the GLES3 spec requirements stated in section 13.4.2.1, but that seems to break a dEQP test that renders wide lines in some multisampling scenarios. Ian noted that the Open 4.4 spec has the following similar text: "The actual width of non-antialiased lines is determined by rounding the supplied width to the nearest integer, then clamping it to the implementation-dependent maximum non-antialiased line width." and suggested that when ES removed antialiased lines, they removed "non-antialised" from that paragraph but probably should not have. Going by that note, this patch restricts the quantization implemented in fe74fee8fa721a only to regular aliased lines. This seems to keep the tests fixed with that commit passing while fixing the broken test. v2: - Drop one of the clamps (Ken, Marius) - Add a rule to prevent advertising line widths that when rounded go beyond the limits allowed by the hardware (Ken) - Update comments in the code accordingly (Ian) - Put the code in a utility function (Ian) Fixes: dEQP-GLES3.functional.rasterization.fbo.rbo_multisample_max.primitives.lines_wide Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90749 Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Cc: "10.6" <[email protected]>
*	i965: Set max texture buffer size to hardware limit	Chris Forbes	2015-06-06	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	Previously we were leaving this at the default of 64K, which meets the spec but is too small for some real uses. The hardware can handle up to 128M. User was complaining about this on freenode ##OpenGL today. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Make NIR non-optional for scalar shaders	Jason Ekstrand	2015-05-28	1	-5/+2
\| \| \| \| \|	Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Use NIR by default for vertex shaders on GEN8+	Jason Ekstrand	2015-05-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	GLSL IR vs. NIR shader-db results for SIMD8 vertex shaders on Broadwell: total instructions in shared programs: 2742062 -> 2681339 (-2.21%) instructions in affected programs: 1514770 -> 1454047 (-4.01%) helped: 5813 HURT: 1120 The gained programs are ARB vertext programs that were previously going through the vec4 backend. Now that we have prog_to_nir, ARB vertex programs can go through the scalar backend so they show up as "gained" in the shader-db results. Acked-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Acked-by: Matt Turner <[email protected]>
*	i965: Use predicate enable bit for conditional rendering w/o stalling	Neil Roberts	2015-05-12	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously whenever a primitive is drawn the driver would call _mesa_check_conditional_render which blocks waiting for the result of the query to determine whether to render. On Gen7+ there is a bit in the 3DPRIMITIVE command which can be used to disable the primitive based on the value of a state bit. This state bit can be set based on whether two registers have different values using the MI_PREDICATE command. We can load these two registers with the pixel count values stored in the query begin and end to implement conditional rendering without stalling. Unfortunately these two source registers were not in the whitelist of available registers in the kernel driver until v3.19. This patch uses the command parser version from intel_screen to detect whether to attempt to set the predicate data registers. The predicate enable bit is currently only used for drawing 3D primitives. For blits, clears, bitmaps, copypixels and drawpixels it still causes a stall. For most of these it would probably just work to call the new brw_check_conditional_render function instead of _mesa_check_conditional_render because they already work in terms of rendering primitives. However it's a bit trickier for blits because it can use the BLT ring or the blorp codepath. I think these operations are less useful for conditional rendering than rendering primitives so it might be best to leave it for a later patch. v2: Use the command parser version to detect whether we can write to the predicate data registers instead of trying to execute a register load command. v3: Simple rebase v4: Changes suggested by Kenneth Graunke: Split the load_64bit_register function out to a separate patch so it can be a shared public function. Avoid calling _mesa_check_conditional_render if we've already determined that there's no query object. Some styling fixes. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/gen6: setup limits for ARB_viewport_array	Chris Forbes	2015-05-06	1	-2/+2
\| \| \| \| \|	Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Fix missing type in local variable declaration.	Kenneth Graunke	2015-05-05	1	-1/+1
\| \| \| \| \| \| \| \| \|	Trivial. Fixes the following compiler warning (from GCC 5.1.0): brw_context.c:629:10: warning: type defaults to ‘int’ in declaration of ‘simd_size’ [-Wimplicit-int] Signed-off-by: Kenneth Graunke <[email protected]>
*	i965: Implement DispatchCompute() back-end	Paul Berry	2015-05-02	1	-0/+1
\| \| \| \| \| \| \|	brw_emit_gpgpu_walker will be implemented in a subsequent patch. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/cs: Set invocation counts based on max_cs_threads	Jordan Justen	2015-05-02	1	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	For ES, we set the max counts based on SIMD8, which is currently accurate. For desktop GL, we set the max counts based on SIMD16, which can fail in some cases where a SIMD16 program is not currently supported. Therefore, this value is not currently accurate, but will work fine in many cases, and lets us run more test cases. Eventually we want to always be able to generate a SIMD16 program. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/cs: Add max_cs_threads	Jordan Justen	2015-05-02	1	-0/+1
\| \| \| \| \| \| \| \|	Add values for gen7 & gen8. These are the number threads in a subslice. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Support compute programs in fs_visitor	Jordan Justen	2015-05-02	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	v2: * Clean out some unneeded code copied from run_fs (krh) * Always use NIR * Split shader time out into a separate commit Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	Fix a few typos	Zoë Blade	2015-04-27	1	-2/+2
\| \| \| \|	Reviewed-by: Francisco Jerez <[email protected]>
*	i965/device_info: Add a supports_simd16_3src flag	Jason Ekstrand	2015-04-22	1	-24/+0
\| \| \| \| \| \| \| \| \| \|	This also involves moving revision checking to screen creation time and passing that into brw_get_device_info so that we can get the right device_info for early versions of SKL. Since the only place we used revision was to check for SIMD16 3-src instruction support, it's safe to remove the revision field from brw_context. Reviewed-by: Matt Turner <[email protected]>
*	i965: replace __FUNCTION__ with __func__	Marius Predut	2015-04-14	1	-2/+2
\| \| \| \| \| \| \| \|	Consistently just use C99's __func__ everywhere. No functional changes. Acked-by: Matt Turner <[email protected]> Signed-off-by: Marius Predut <[email protected]>
*	i965: Remove useless null check.	Matt Turner	2015-04-11	1	-4/+0
\| \| \| \|	If it were null, we'd have just derefernced it two lines above.
*	nir: split out lower_sub from lower_negate	Rob Clark	2015-04-11	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	Originally you had to have one or the other. But actually I don't want either. (Or rather I want whatever is the minimum # of instructions.) TODO: not sure where the best place to insert a check that driver hasn't set both lower_negate and lower_sub? Signed-off-by: Rob Clark <[email protected]>
*	i965: Use NIR by default for fragment shaders	Jason Ekstrand	2015-04-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	GLSL IR vs. NIR shader-db results on i965: total instructions in shared programs: 2889747 -> 2890782 (0.04%) instructions in affected programs: 2425446 -> 2426481 (0.04%) helped: 3698 HURT: 5341 GLSL IR vs. NIR shader-db results on g4x: total instructions in shared programs: 2547252 -> 2550440 (0.13%) instructions in affected programs: 1984482 -> 1987670 (0.16%) helped: 2844 HURT: 4776 GLSL IR vs. NIR shader-db results on Iron Lake: total instructions in shared programs: 4053381 -> 4063828 (0.26%) instructions in affected programs: 3026601 -> 3037048 (0.35%) helped: 4110 HURT: 8331 GAINED: 1287 LOST: 9 GLSL IR vs. NIR shader-db results on Sandy Bridge: total instructions in shared programs: 5307041 -> 5236666 (-1.33%) instructions in affected programs: 3442908 -> 3372533 (-2.04%) helped: 11829 HURT: 5604 GAINED: 33 LOST: 18 GLSL IR vs. NIR shader-db results on Ivy Bridge: total instructions in shared programs: 4926333 -> 4857017 (-1.41%) instructions in affected programs: 3144042 -> 3074726 (-2.20%) helped: 11559 HURT: 4774 GAINED: 46 LOST: 25 GLSL IR vs. NIR shader-db results on Bay Trail: total instructions in shared programs: 4926333 -> 4857017 (-1.41%) instructions in affected programs: 3144042 -> 3074726 (-2.20%) helped: 11559 HURT: 4774 GAINED: 46 LOST: 25 GLSL IR vs. NIR shader-db results on Haswell: total instructions in shared programs: 4392487 -> 4293476 (-2.25%) instructions in affected programs: 2800180 -> 2701169 (-3.54%) helped: 13073 HURT: 3383 GAINED: 46 LOST: 23 GLSL IR vs. NIR shader-db results on Broadwell (FS only): total instructions in shared programs: 4378113 -> 4283025 (-2.17%) instructions in affected programs: 2743209 -> 2648121 (-3.47%) helped: 12470 HURT: 3609 GAINED: 64 LOST: 27 Signed-off-by: Jason Ekstrand <[email protected]> Acked-by: Matt Turner <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
*	i965: Don't set NirOptions for stages that will use the vec4 backend.cros-mesa-10.6-vanilla chadv/cros-mesa-10.6-vanilla chadv/cros-gerrit-262788-base	Kenneth Graunke	2015-04-10	1	-9/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We've started using NirOptions != NULL to mean "we're using NIR for this stage." However, when INTEL_USE_NIR=1, we set it for a bunch of stages that still use the vec4 backend, and thus definitely aren't using NIR. For example, if INTEL_USE_NIR=1 we disable the GLSL IR cubemap normalization pass, even for vertex shaders and geometry shaders. This is wrong, but breaks a very uncommon case. When I started deleting GLSL IR for stages where we claimed to be using NIR, this bug quickly became apparent. For now, only set it for fragment shaders, and vertex shaders if brw->scalar_vs is set. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: Check the INTEL_USE_NIR environment variable once at context creation	Jason Ekstrand	2015-04-03	1	-1/+9
\| \| \| \|	Reviewed-by: Jordan Justen <[email protected]>
*	i965: Use the same nir options for all gens	Jason Ekstrand	2015-04-01	1	-10/+2
\| \| \| \| \| \| \|	If we tell NIR to split ffma's, then we don't need seperate options anymore. Reviewed-by: Matt Turner <[email protected]>
*	i965/nir: Run the ffma peephole after the rest of the optimizations	Jason Ekstrand	2015-04-01	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The idea here is that fusing multiply-add combinations too early can reduce our ability to perform CSE and value-numbering. Instead, we split ffma opcodes up-front, hope CSE cleans up, and then fuse after-the-fact. Unless an algebraic pass does something silly where it inserts something between the multiply and the add, splitting and re-fusing should never cause a problem. We run the late algebraic optimizations after this so that things like compare-with-zero don't hurt our ability to fuse things. shader-db results for fragment shaders on Haswell: total instructions in shared programs: 4390538 -> 4379236 (-0.26%) instructions in affected programs: 989359 -> 978057 (-1.14%) helped: 5308 HURT: 97 GAINED: 78 LOST: 5 This does, unfortunately, cause some substantial hurt to a shader in Kerbal Space Program. However, the damage is caused by changing a single instruction from a ffma to an add. This, in turn, decreases register pressure in one part of the program causing it to fail to register allocate and spill. Given the overwhelmingly positive results in other shaders and the fact that the NIR for the Kerbal shaders is actually better, this should be considered a positive. Reviewed-by: Matt Turner <[email protected]>
*	i965/nir: Use NIR lowering for ffma for gen < 6	Jason Ekstrand	2015-03-23	1	-2/+10
\| \| \| \|	Reviewed-by: Matt Turner <[email protected]>
*	i965: define I915_PARAM_REVISION	Dave Airlie	2015-03-23	1	-0/+5
\| \| \| \| \| \| \|	we are broken against the libdrm 2.4.60 minimum specified, so fix it for now. Signed-off-by: Dave Airlie <[email protected]>
*	i965: Store the GPU revision number in brw_context	Neil Roberts	2015-03-20	1	-0/+19
\| \| \| \| \| \| \| \| \| \| \|	brwContextInit now queries the GPU revision number via a new parameter for DRM_I915_GETPARAM. This new parameter requires a kernel patch and a patch to libdrm. If the kernel doesn't support it then it will continue but set the revision number to -1. The intention is to use this to implement workarounds that are only needed on certain steppings of the GPU. Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965: Defer the throttle until we submit new commands	Chris Wilson	2015-03-18	1	-34/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, we throttle before the user begins preparing commands for the next frame when we acquire the draw/read buffers. However, construction of the command buffer can itself take significant time relative to the frame time. If we move the throttle from the buffer acquire to the command submit phase we can allow the user to improve concurrency between the CPU and GPU (i.e. reduce the amount of time we waste inside the throttle). v2: Whitespace + delay throttling until after the next submission for greater parallelism Signed-off-by: Chris Wilson <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Kenneth Graunke <[email protected]> Cc: Ben Widawsky <[email protected]> Cc: Kristian Høgsberg <[email protected]> Cc: Chad Versace <[email protected]> Cc: Ian Romanick <[email protected]> Reviewed-by: Chad Versace <[email protected]> [v1]
*	i965: Throttle to the previous frame	Chris Wilson	2015-03-18	1	-7/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In order to facilitate the concurrency offered by triple buffering and to offset the latency induced by swapping via an external process, which may incur extra rendering itself, only throttle to the previous frame and not the last. The second issue that mostly affects swap benchmarks, but also can incur jitter in the throttling, is that the throttle bo is closer to the next SwapBuffers rather than immediately after the previous SwapBuffers. Throttling to the previous frame doubles the maximum possible latency at the benefit of improving throughput and reducing jitter. v2: Rename "first_post_swapbuffer" batches array to a plain throttle_batch[] as the pluralisation was contorting the name and not making it clear as to whether it was the first batch or first_post_swap batch. Not least of which was that not all throttle points are SwapBuffers. Signed-off-by: Chris Wilson <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Kenneth Graunke <[email protected]> Cc: Ben Widawsky <[email protected]> Cc: Kristian Høgsberg <[email protected]> Cc: Chad Versace <[email protected]> Cc: Ian Romanick <[email protected]> Reviewed-by: Chad Versace <[email protected]>
*	i965: Throttle rendering to an fbo	Chris Wilson	2015-03-18	1	-4/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When rendering to an fbo, even though it may be acting as a winsys frontbuffer or just generally, we never throttle. However, when rendering to an fbo, there is no natural frame boundary. Conventionally we use SwapBuffers and glFinish, but potential callers avoid often glFinish for being too heavy handed (waiting on all outstanding rendering to complete). The kernel provides a soft-throttling option for this case that waits for rendering older than 20ms to be complete (that's a little too lax to be used for swapbuffers, but is here a useful safety net). The remaining choice is then either never to throttle, throttle after every draw call, or at after intermediate user defined point such as glFlush and thus all the implied flushes. This patch opts for the latter as that is the current method used for flushing to front buffers. v2: Defer the throttling from inside the flush to the next intel_prepare_render() and switch non-fbo frontbuffer throttling over to use the same lax method. The issuing being that glFlush()/intel_prepare_read() is just as likely to be called inside a tight loop and not at "frame" boundaries. v3: Rename from need_front_throttle to need_flush_throttle to avoid any ambiguity between front buffer rendering and fbo rendering. (Chad) v4: Whitespace Signed-off-by: Chris Wilson <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Kenneth Graunke <[email protected]> Cc: Ben Widawsky <[email protected]> Cc: Kristian Høgsberg <[email protected]> Cc: Chad Versace <[email protected]> Cc: Ian Romanick <[email protected]> Reviewed-by: Chad Versace <[email protected]>
*	nir: Add native_integers to nir_shader_compiler_options.	Kenneth Graunke	2015-03-08	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \|	glsl_to_nir, tgsi_to_nir, and prog_to_nir all want to know whether the driver supports native integers. Presumably other passes may as well. Adding this to nir_shader_compiler_options is an easy way to provide that information, as it's accessible via nir_shader::options. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: Try to make sense of the nir_shader_compiler_options code.	Kenneth Graunke	2015-03-08	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The code in glsl_to_nir is entirely dead, as we translate from GLSL to NIR at link time, when there isn't a _mesa_glsl_parse_state to pass, so every caller passes NULL. glsl_to_nir seems like the wrong place to try and create the shader compiler options structure anyway - tgsi_to_nir, prog_to_nir, and other translators all would have to duplicate that code. The driver should set this up once with whatever settings it wants, and pass it in. Eric also added a NirOptions field to ctx->Const.ShaderCompilerOptions[] and left a comment saying: "The memory for the options is expected to be kept in a single static copy by the driver." This suggests the plan was to do exactly that. That pointer was not marked const, however, and the dead code used a mix of static structures and ralloced ones. This patch deletes the dead code in glsl_to_nir, instead making it take the shader compiler options as a mandatory argument. It creates an (empty) options struct in the i965 driver, and makes NirOptions point to that. It marks the pointer const so that we can actually do so without generating "discards const qualifier" compiler warnings. Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965: free scratch buffers when destroying the context	Iago Toral Quiroga	2015-03-06	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If scratch space is needed for a shader stage we try to reuse the last scratch buffer bound to that stage. If we can't, we free the old scratch buffer and allocate a new one. This means we always keep the last scratch buffer for a particular shader stage around for the entire life span of the context. These buffers are being reported by Valgrind as definitely lost after destroying the OpenGL context. For example, for the geometry shader stage: ==18350== 248 bytes in 1 blocks are definitely lost in loss record 85 of 150 ==18350== at 0x4C2CC70: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==18350== by 0xA1B35D6: drm_intel_gem_bo_alloc_internal (intel_bufmgr_gem.c:724) ==18350== by 0xA1B383F: drm_intel_gem_bo_alloc (intel_bufmgr_gem.c:794) ==18350== by 0xA1AEFA3: drm_intel_bo_alloc (intel_bufmgr.c:52) ==18350== by 0x9D08E31: brw_get_scratch_bo (brw_program.c:226) ==18350== by 0x9D2A0F2: do_gs_prog (brw_vec4_gs.c:280) ==18350== by 0x9D2A635: brw_gs_precompile (brw_vec4_gs.c:401) ==18350== by 0x9D14F68: brw_shader_precompile(gl_context, gl_shader_program) (brw_shader.cpp:76) ==18350== by 0x9D157B8: brw_link_shader (brw_shader.cpp:269) ==18350== by 0x9B0941E: _mesa_glsl_link_shader (ir_to_mesa.cpp:3038) ==18350== by 0x99AE4ED: link_program (shaderapi.c:917) ==18350== by 0x99AF365: _mesa_LinkProgram (shaderapi.c:1385) So make sure that by the time we destroy the context we check if we have live scratch buffers for the various stages and release them if that is the case. Reviewed-by: Jordan Justen <[email protected]>
*	i965: Fix non-AA wide line rendering with fractional line widths	Iago Toral Quiroga	2015-02-24	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	"(...)Let w be the width rounded to the nearest integer (...). If the line segment has endpoints given by (x0,y0) and (x1,y1) in window coordinates, the segment with endpoints (x0,y0-(w-1)/2) and (x1,y1-(w-1/2)) is rasterized, (...)" The hardware it not rounding the line width, so we should do it. Also, we should be careful not to go beyond the hardware limits for the line width after it gets rounded. Gen6-7 define a maximum line width slightly below 8.0, so we should advertise a maximum line width lower than 7.5 to make sure that 7.0 is the maximum integer line width that we can select. Since the line width granularity in these platforms is 0.125, we choose 7.375. Other platforms advertise rounded maximum line widths, so those are fine. Fixes the following 3 dEQP tests: dEQP-GLES3.functional.rasterization.primitives.lines_wide dEQP-GLES3.functional.rasterization.fbo.texture_2d.primitives.lines_wide dEQP-GLES3.functional.rasterization.fbo.rbo_singlesample.primitives.lines_wide Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Add device limits for tess threads & URB entries	Chris Forbes	2015-02-17	1	-0/+4
\| \| \| \| \| \| \| \|	This should cover all platforms prior to Skylake. Signed-off-by: Chris Forbes <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Ben Widawsky <[email protected]>
*	i965: Sets missing vertex shader constant values for HighInt format	Eduardo Lima Mitev	2015-01-13	1	-0/+6
\| \| \| \| \| \| \| \| \|	The range's min and max, and the precision value are not set correctly for the vertex shader constants. Fixes 1 dEQP test: dEQP-GLES3.functional.state_query.shader.precision_vertex_highp_int Reviewed-by: Ian Romanick <[email protected]>
*	i965/skl: Report more accurate number of samples for format	Kristian Høgsberg	2015-01-07	1	-0/+2
\| \| \| \| \|	Signed-off-by: Kristian Høgsberg <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965: Generate vs code using scalar backend for BDW+	Kristian Høgsberg	2014-12-10	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With everything in place, we can now use the scalar backend compiler for vertex shaders on BDW+. We make scalar vertex shaders the default on BDW+ but add a new vec4vs debug option to force the vec4 backend. No piglit regressions. Performance impact is minimal, I see a ~1.5 improvement on the T-Rex GLBenchmark case, but in general it's in the noise. Some of our internal synthetic, vs bounded benchmarks show great improvement, 20%-40% in some cases, but real-world cases are mostly unaffected. Signed-off-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>