aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_context.c
Commit message (Collapse)AuthorAgeFilesLines
* util: move brw_env_var_as_boolean() to utilRob Clark2015-11-241-2/+3
| | | | | | | | Kind of a handy function. And I'll want it available outside of i965 for common nir-pass helpers. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* i965: Set MaxCombinedUniformBlocks properly.Kenneth Graunke2015-11-161-0/+1
| | | | | | | | | | Up until now, we've been letting core Mesa initialize it to 36 for us (which is presumably BRW_MAX_UBO (12) * (VS+GS+FS stages -> 3)). With compute and tessellation, we need to increase this. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Clean up context constant initialization code.Kenneth Graunke2015-11-161-80/+54
| | | | | | | | | | | | | | | | | | | | | | | | This was getting pretty out of hand, and with compute partially in place and tessellation on the way, it was only going to get worse. This patch makes a "stage exists?" predicate and a "number of stages" count and uses them to clean up a lot of calculations. We can just loop over shader stages and set things for the ones that exist. For combined counts, we can just multiply by the number of stages. It also tries to organize a little bit. We should probably use _mesa_has_geometry_shaders/tessellation/compute here, but we can't because ctx->Version isn't initialized yet. Perhaps that could be fixed in the future. No change in "glxinfo -l" on Broadwell. v2: Drop stray compute shader hunk. Mark stage_exists as const. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Convert scalar_* flags to a scalar_stage array.Kenneth Graunke2015-11-161-1/+1
| | | | | | | | | I was going to add scalar_tcs and scalar_tes flags, and then thought better of it and decided to convert this to an array. Simpler. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/skl+: Enable support for 16x multisamplingNeil Roberts2015-11-051-0/+6
| | | | Reviewed-by: Ben Widawsky <[email protected]>
* i965: Move the entire compiler API into a single fileJason Ekstrand2015-10-191-1/+1
| | | | | | | | | | | At this point, the compiler API has been substantially simplified. In the spirit of Kristian's making a compiler library, this commit makes a single header file that contains, more-or-less, the entire compiler API. There's still a bit of cleanup to do particularly in the area of geometry shaders. However, this gets us much closer to having a separate compiler. Reviewed-by: Topi Pohjolainen <[email protected]>
* mesa/i965: Refactor brw_is_front_buffer_{drawing,reading} to common codeIan Romanick2015-10-061-7/+8
| | | | | | | | | | There are multiple similar implementations of these functions, and a later patch was going to add another. v2: Move removing intel_framebuffer to a different patch. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965: Define BRW_MAX_SSBOIago Toral Quiroga2015-10-051-7/+7
| | | | | | Instead of using hard-coded values. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Define BRW_MAX_UBOIago Toral Quiroga2015-10-051-2/+2
| | | | | | Instead of using hard-coded values. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Remove early release of DRI2 miptreeChris Wilson2015-09-301-1/+0
| | | | | | | | | | | | | | | intel_update_winsys_renderbuffer_miptree() will release the existing miptree when wrapping a new DRI2 buffer, so we can remove the early release and so prevent a NULL mt dereference should importing the new DRI2 name fail for any reason. (Reusing the old DRI2 name will result in the rendering going astray, to a stale buffer, and not shown on the screen, but it allows us to issue a warning and not crash much later in innocent code.) Signed-off-by: Chris Wilson <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86281 Reviewed-by: Martin Peres <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965: Set MaxShaderStorageBuffers for compute shadersIago Toral Quiroga2015-09-251-0/+3
| | | | | | | | v2: - Set it after the driver's MaxShaderStorageBuffers value assignment. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965: set ARB_shader_storage_buffer_object related constant valuesSamuel Iglesias Gonsalvez2015-09-251-0/+12
| | | | | | | | | | | | v2: - Add tessellation shader constants assignment v3: - Set MaxShaderStorageBufferBindings to 36. Signed-off-by: Samuel Iglesias Gonsalvez <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965: Use 64-byte offset alignment for shader storage buffersIago Toral Quiroga2015-09-251-0/+9
| | | | | | | | | | | | | | | | | This should be a cacheline (64 bytes) so that we can safely have the CPU and GPU writing the same SSBO on non-cachecoherent systems (our Atom CPUs). With UBOs, the GPU never writes, so there's no problem. For an SSBO, the GPU and the CPU can be updating disjoint regions of the buffer simultaneously and that will break if the regions overlap the same cacheline. v2: - Use cacheline size (64 bytes) instead of 16 bytes (Kristian). - Update commit log and add a comment in the code explaining why we use cacheline size (Ben). Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965: Advertise 65536 for GL_MAX_UNIFORM_BLOCK_SIZE.Kenneth Graunke2015-09-101-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | Our old value of 16384 is the minimum value. DirectX apparently requires 65536 at a minimum; that's also what nVidia and the Intel Windows driver advertise. AMD advertises MAX_INT. Ilia Mirkin noticed that "Shadow Warrior" uses UBOs larger than 16k on Nouveau, which advertises 65536 bytes for this limit. Traces captured on Nouveau don't work on i965 because our lower limit causes the GLSL linker to reject the captured shaders. While this isn't important in and of itself, it does suggest that raising the limit would be beneficial. We can read linear buffers up to 2^27 bytes in size, so raising this should be safe; we could probably even go larger. For now, matching nVidia and Intel/Windows seems like a good plan. We have to reinitialize MaxCombinedUniformComponents as core Mesa will have set it based on a stale value for MaxUniformBlockSize. According to Tapani, there's an unreleased game that asserts on this. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Tapani Pälli <[email protected]> Cc: "11.0" <[email protected]>
* mesa: Rename MaxCombinedImageUnitsAndFragmentOutputs to ↵Francisco Jerez2015-08-201-1/+1
| | | | | | | | | | | | | MaxCombinedShaderOutputResources. The name of both the GLSL built-in variable and the glGetInteger param with the same value changed in GLSL ES 3.1 and GL 4.5. Its semantics also changed slightly, since the limit now also takes into account the number of SSBs in use. Switch our internal data structures to the up-to-date name. Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Define implementation constants for ARB_shader_image_load_store.Francisco Jerez2015-08-111-0/+12
| | | | | | | | Reviewed-by: Paul Berry <[email protected]> v2: Drop VS support pre-Gen8, drop GS support. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Enable hardware-generated binding tables on render path.Abdiel Janulgue2015-07-181-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements the binding table enable command which is also used to allocate a binding table pool where where hardware-generated binding table entries are flushed into. Each binding table offset in the binding table pool is unique per each shader stage that are enabled within a batch. Also insert the required brw_tracked_state objects to enable hw-generated binding tables in normal render path. v2: - Use MOCS in binding table pool alloc for GEN8 - Fix spurious offset when allocating binding table pool entry and start from zero instead. v3: - Include GEN8 fix for spurious offset above. v4: - Fixup wrong packet length in enable/disable hw-binding table for GEN8 (Ville). - Don't invoke HW-binding table disable command when we dont have resource streamer (Chris). v5: - Reorder the state cache invalidate flush so it happens in-between enabling hw-generated binding tables and the previous sw-binding table GPU state (Chris). v6: - Do the same fix in v5 for gen7_disable_hw_binding_tables(). - Adhere to coding guidelines and make comments more informative. Cc: [email protected] Cc: [email protected] Cc: [email protected] Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Abdiel Janulgue <[email protected]>
* i965: Enable resource streamer for the batchbufferAbdiel Janulgue2015-07-181-0/+4
| | | | | | | | | | | | | | | | | | | | | Check first if the hardware and kernel supports resource streamer. If this is allowed, tell the kernel to enable the resource streamer enable bit on MI_BATCHBUFFER_START by specifying I915_EXEC_RESOURCE_STREAMER execbuffer flags. v2: - Use new I915_PARAM_HAS_RESOURCE_STREAMER ioctl to check if kernel supports RS (Ken). - Add brw_device_info::has_resource_streamer and toggle it for Haswell, Broadwell, Cherryview, Skylake, and Broxton (Ken). v3: - Update I915_PARAM_HAS_RESOURCE_STREAMER to match updated kernel. v4: - Always inspect the getparam.value (Chris Wilson). v5: - Fold redundant devinfo->has_resource_streamer check in context create into init screen. Cc: [email protected] Cc: [email protected] Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Abdiel Janulgue <[email protected]>
* i965: Move pipecontrol workaround bo to brw_pipe_controlChris Wilson2015-07-081-0/+7
| | | | | | | | | | With the exception of gen8, the sole user of the workaround bo are for emitting pipe controls. Move it out of the purview of the batchbuffer and into the pipecontrol. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Martin Peres <[email protected]>
* i965/bxt: Add basic Broxton infrastructureBen Widawsky2015-06-241-0/+1
| | | | | | | | | | | | | | | | | The thread counts and URB information are all speculative numbers that were based on some CHV numbers at the time. v2: Originally this patch had PCI IDs. I've moved that to a new patch at the end of the series. Remove is_cherryview hack. Add PCI ids. These match the ones defined in the kernel. The only one tested by us is 0x0a84. Capitalize the hex string (Mark) Signed-off-by: Ben Widawsky <[email protected]> Tested-by: "Lecluse, Philippe" <[email protected]> Reviewed-by: Mark Janes <[email protected]>
* i965: Add compiler options to brw_compilerJason Ekstrand2015-06-231-43/+3
| | | | | | | | | | | | | This creates the options at screen cration time and then we just copy them into the context at context creation time. We also move is_scalar to the brw_compiler structure. We also end up manually setting some values that the core would have set by default for us. Fortunately, there are only two non-zero shader compiler option defaults that we aren't overriding anyway so this isn't a big deal. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Move INTEL_DEBUG variable parsing to screen creation timeJason Ekstrand2015-06-231-1/+3
| | | | | | v2: Do bufmgr set_debug and set_aub_dump at screen time as well. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: enable ARB_framebuffer_no_attachments for Gen7+Kevin Rogovin2015-06-171-0/+6
| | | | | | | Enable GL_ARB_framebuffer_no_attachments in i965 for Gen7 and higher. Reviewed-by: Ian Romanick <[email protected]> Signed-off-by: Kevin Rogovin <[email protected]>
* Revert "i965: Advertise a line width of 40.0 on Cherryview and Skylake."Kenneth Graunke2015-06-111-5/+1
| | | | | | | | | | | | | This reverts commit f3b709c0ac073cd0ec90a3a0d91d1ee94668e043. The "dEQP-GLES3.functional.rasterization.fbo.rbo_multisample_4. interpolation.lines_wide" test appears to be broken on Cherryview when we expose line widths greater than 12.0. I'm not sure why. For now, just go back to the limits we used on older platforms. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90902 Acked-by: Matt Turner <[email protected]>
* i965: do not round line width when multisampling or antialiaing are enabledIago Toral Quiroga2015-06-111-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit fe74fee8fa721a we rounded the line width to the nearest integer to match the GLES3 spec requirements stated in section 13.4.2.1, but that seems to break a dEQP test that renders wide lines in some multisampling scenarios. Ian noted that the Open 4.4 spec has the following similar text: "The actual width of non-antialiased lines is determined by rounding the supplied width to the nearest integer, then clamping it to the implementation-dependent maximum non-antialiased line width." and suggested that when ES removed antialiased lines, they removed "non-antialised" from that paragraph but probably should not have. Going by that note, this patch restricts the quantization implemented in fe74fee8fa721a only to regular aliased lines. This seems to keep the tests fixed with that commit passing while fixing the broken test. v2: - Drop one of the clamps (Ken, Marius) - Add a rule to prevent advertising line widths that when rounded go beyond the limits allowed by the hardware (Ken) - Update comments in the code accordingly (Ian) - Put the code in a utility function (Ian) Fixes: dEQP-GLES3.functional.rasterization.fbo.rbo_multisample_max.primitives.lines_wide Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90749 Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Cc: "10.6" <[email protected]>
* i965: Set max texture buffer size to hardware limitChris Forbes2015-06-061-0/+1
| | | | | | | | | | | Previously we were leaving this at the default of 64K, which meets the spec but is too small for some real uses. The hardware can handle up to 128M. User was complaining about this on freenode ##OpenGL today. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Make NIR non-optional for scalar shadersJason Ekstrand2015-05-281-5/+2
| | | | | Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Use NIR by default for vertex shaders on GEN8+Jason Ekstrand2015-05-181-1/+1
| | | | | | | | | | | | | | | | | | GLSL IR vs. NIR shader-db results for SIMD8 vertex shaders on Broadwell: total instructions in shared programs: 2742062 -> 2681339 (-2.21%) instructions in affected programs: 1514770 -> 1454047 (-4.01%) helped: 5813 HURT: 1120 The gained programs are ARB vertext programs that were previously going through the vec4 backend. Now that we have prog_to_nir, ARB vertex programs can go through the scalar backend so they show up as "gained" in the shader-db results. Acked-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Acked-by: Matt Turner <[email protected]>
* i965: Use predicate enable bit for conditional rendering w/o stallingNeil Roberts2015-05-121-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously whenever a primitive is drawn the driver would call _mesa_check_conditional_render which blocks waiting for the result of the query to determine whether to render. On Gen7+ there is a bit in the 3DPRIMITIVE command which can be used to disable the primitive based on the value of a state bit. This state bit can be set based on whether two registers have different values using the MI_PREDICATE command. We can load these two registers with the pixel count values stored in the query begin and end to implement conditional rendering without stalling. Unfortunately these two source registers were not in the whitelist of available registers in the kernel driver until v3.19. This patch uses the command parser version from intel_screen to detect whether to attempt to set the predicate data registers. The predicate enable bit is currently only used for drawing 3D primitives. For blits, clears, bitmaps, copypixels and drawpixels it still causes a stall. For most of these it would probably just work to call the new brw_check_conditional_render function instead of _mesa_check_conditional_render because they already work in terms of rendering primitives. However it's a bit trickier for blits because it can use the BLT ring or the blorp codepath. I think these operations are less useful for conditional rendering than rendering primitives so it might be best to leave it for a later patch. v2: Use the command parser version to detect whether we can write to the predicate data registers instead of trying to execute a register load command. v3: Simple rebase v4: Changes suggested by Kenneth Graunke: Split the load_64bit_register function out to a separate patch so it can be a shared public function. Avoid calling _mesa_check_conditional_render if we've already determined that there's no query object. Some styling fixes. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/gen6: setup limits for ARB_viewport_arrayChris Forbes2015-05-061-2/+2
| | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Fix missing type in local variable declaration.Kenneth Graunke2015-05-051-1/+1
| | | | | | | | | Trivial. Fixes the following compiler warning (from GCC 5.1.0): brw_context.c:629:10: warning: type defaults to ‘int’ in declaration of ‘simd_size’ [-Wimplicit-int] Signed-off-by: Kenneth Graunke <[email protected]>
* i965: Implement DispatchCompute() back-endPaul Berry2015-05-021-0/+1
| | | | | | | brw_emit_gpgpu_walker will be implemented in a subsequent patch. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/cs: Set invocation counts based on max_cs_threadsJordan Justen2015-05-021-0/+24
| | | | | | | | | | | | | | For ES, we set the max counts based on SIMD8, which is currently accurate. For desktop GL, we set the max counts based on SIMD16, which can fail in some cases where a SIMD16 program is not currently supported. Therefore, this value is not currently accurate, but will work fine in many cases, and lets us run more test cases. Eventually we want to always be able to generate a SIMD16 program. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/cs: Add max_cs_threadsJordan Justen2015-05-021-0/+1
| | | | | | | | Add values for gen7 & gen8. These are the number threads in a subslice. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Support compute programs in fs_visitorJordan Justen2015-05-021-0/+2
| | | | | | | | | | v2: * Clean out some unneeded code copied from run_fs (krh) * Always use NIR * Split shader time out into a separate commit Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* Fix a few typosZoë Blade2015-04-271-2/+2
| | | | Reviewed-by: Francisco Jerez <[email protected]>
* i965/device_info: Add a supports_simd16_3src flagJason Ekstrand2015-04-221-24/+0
| | | | | | | | | | This also involves moving revision checking to screen creation time and passing that into brw_get_device_info so that we can get the right device_info for early versions of SKL. Since the only place we used revision was to check for SIMD16 3-src instruction support, it's safe to remove the revision field from brw_context. Reviewed-by: Matt Turner <[email protected]>
* i965: replace __FUNCTION__ with __func__Marius Predut2015-04-141-2/+2
| | | | | | | | Consistently just use C99's __func__ everywhere. No functional changes. Acked-by: Matt Turner <[email protected]> Signed-off-by: Marius Predut <[email protected]>
* i965: Remove useless null check.Matt Turner2015-04-111-4/+0
| | | | If it were null, we'd have just derefernced it two lines above.
* nir: split out lower_sub from lower_negateRob Clark2015-04-111-0/+1
| | | | | | | | | | Originally you had to have one or the other. But actually I don't want either. (Or rather I want whatever is the minimum # of instructions.) TODO: not sure where the best place to insert a check that driver hasn't set *both* lower_negate and lower_sub? Signed-off-by: Rob Clark <[email protected]>
* i965: Use NIR by default for fragment shadersJason Ekstrand2015-04-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GLSL IR vs. NIR shader-db results on i965: total instructions in shared programs: 2889747 -> 2890782 (0.04%) instructions in affected programs: 2425446 -> 2426481 (0.04%) helped: 3698 HURT: 5341 GLSL IR vs. NIR shader-db results on g4x: total instructions in shared programs: 2547252 -> 2550440 (0.13%) instructions in affected programs: 1984482 -> 1987670 (0.16%) helped: 2844 HURT: 4776 GLSL IR vs. NIR shader-db results on Iron Lake: total instructions in shared programs: 4053381 -> 4063828 (0.26%) instructions in affected programs: 3026601 -> 3037048 (0.35%) helped: 4110 HURT: 8331 GAINED: 1287 LOST: 9 GLSL IR vs. NIR shader-db results on Sandy Bridge: total instructions in shared programs: 5307041 -> 5236666 (-1.33%) instructions in affected programs: 3442908 -> 3372533 (-2.04%) helped: 11829 HURT: 5604 GAINED: 33 LOST: 18 GLSL IR vs. NIR shader-db results on Ivy Bridge: total instructions in shared programs: 4926333 -> 4857017 (-1.41%) instructions in affected programs: 3144042 -> 3074726 (-2.20%) helped: 11559 HURT: 4774 GAINED: 46 LOST: 25 GLSL IR vs. NIR shader-db results on Bay Trail: total instructions in shared programs: 4926333 -> 4857017 (-1.41%) instructions in affected programs: 3144042 -> 3074726 (-2.20%) helped: 11559 HURT: 4774 GAINED: 46 LOST: 25 GLSL IR vs. NIR shader-db results on Haswell: total instructions in shared programs: 4392487 -> 4293476 (-2.25%) instructions in affected programs: 2800180 -> 2701169 (-3.54%) helped: 13073 HURT: 3383 GAINED: 46 LOST: 23 GLSL IR vs. NIR shader-db results on Broadwell (FS only): total instructions in shared programs: 4378113 -> 4283025 (-2.17%) instructions in affected programs: 2743209 -> 2648121 (-3.47%) helped: 12470 HURT: 3609 GAINED: 64 LOST: 27 Signed-off-by: Jason Ekstrand <[email protected]> Acked-by: Matt Turner <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* i965: Don't set NirOptions for stages that will use the vec4 backend.cros-mesa-10.6-vanillachadv/cros-mesa-10.6-vanillachadv/cros-gerrit-262788-baseKenneth Graunke2015-04-101-9/+6
| | | | | | | | | | | | | | | | | | | We've started using NirOptions != NULL to mean "we're using NIR for this stage." However, when INTEL_USE_NIR=1, we set it for a bunch of stages that still use the vec4 backend, and thus definitely aren't using NIR. For example, if INTEL_USE_NIR=1 we disable the GLSL IR cubemap normalization pass, even for vertex shaders and geometry shaders. This is wrong, but breaks a very uncommon case. When I started deleting GLSL IR for stages where we claimed to be using NIR, this bug quickly became apparent. For now, only set it for fragment shaders, and vertex shaders if brw->scalar_vs is set. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Check the INTEL_USE_NIR environment variable once at context creationJason Ekstrand2015-04-031-1/+9
| | | | Reviewed-by: Jordan Justen <[email protected]>
* i965: Use the same nir options for all gensJason Ekstrand2015-04-011-10/+2
| | | | | | | If we tell NIR to split ffma's, then we don't need seperate options anymore. Reviewed-by: Matt Turner <[email protected]>
* i965/nir: Run the ffma peephole after the rest of the optimizationsJason Ekstrand2015-04-011-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The idea here is that fusing multiply-add combinations too early can reduce our ability to perform CSE and value-numbering. Instead, we split ffma opcodes up-front, hope CSE cleans up, and then fuse after-the-fact. Unless an algebraic pass does something silly where it inserts something between the multiply and the add, splitting and re-fusing should never cause a problem. We run the late algebraic optimizations after this so that things like compare-with-zero don't hurt our ability to fuse things. shader-db results for fragment shaders on Haswell: total instructions in shared programs: 4390538 -> 4379236 (-0.26%) instructions in affected programs: 989359 -> 978057 (-1.14%) helped: 5308 HURT: 97 GAINED: 78 LOST: 5 This does, unfortunately, cause some substantial hurt to a shader in Kerbal Space Program. However, the damage is caused by changing a single instruction from a ffma to an add. This, in turn, *decreases* register pressure in one part of the program causing it to fail to register allocate and spill. Given the overwhelmingly positive results in other shaders and the fact that the NIR for the Kerbal shaders is actually better, this should be considered a positive. Reviewed-by: Matt Turner <[email protected]>
* i965/nir: Use NIR lowering for ffma for gen < 6Jason Ekstrand2015-03-231-2/+10
| | | | Reviewed-by: Matt Turner <[email protected]>
* i965: define I915_PARAM_REVISIONDave Airlie2015-03-231-0/+5
| | | | | | | we are broken against the libdrm 2.4.60 minimum specified, so fix it for now. Signed-off-by: Dave Airlie <[email protected]>
* i965: Store the GPU revision number in brw_contextNeil Roberts2015-03-201-0/+19
| | | | | | | | | | | brwContextInit now queries the GPU revision number via a new parameter for DRM_I915_GETPARAM. This new parameter requires a kernel patch and a patch to libdrm. If the kernel doesn't support it then it will continue but set the revision number to -1. The intention is to use this to implement workarounds that are only needed on certain steppings of the GPU. Reviewed-by: Kristian Høgsberg <[email protected]>
* i965: Defer the throttle until we submit new commandsChris Wilson2015-03-181-34/+0
| | | | | | | | | | | | | | | | | | | | | | Currently, we throttle before the user begins preparing commands for the next frame when we acquire the draw/read buffers. However, construction of the command buffer can itself take significant time relative to the frame time. If we move the throttle from the buffer acquire to the command submit phase we can allow the user to improve concurrency between the CPU and GPU (i.e. reduce the amount of time we waste inside the throttle). v2: Whitespace + delay throttling until after the next submission for greater parallelism Signed-off-by: Chris Wilson <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Kenneth Graunke <[email protected]> Cc: Ben Widawsky <[email protected]> Cc: Kristian Høgsberg <[email protected]> Cc: Chad Versace <[email protected]> Cc: Ian Romanick <[email protected]> Reviewed-by: Chad Versace <[email protected]> [v1]
* i965: Throttle to the previous frameChris Wilson2015-03-181-7/+12
| | | | | | | | | | | | | | | | | | | | | | | | | In order to facilitate the concurrency offered by triple buffering and to offset the latency induced by swapping via an external process, which may incur extra rendering itself, only throttle to the previous frame and not the last. The second issue that mostly affects swap benchmarks, but also can incur jitter in the throttling, is that the throttle bo is closer to the next SwapBuffers rather than immediately after the previous SwapBuffers. Throttling to the previous frame doubles the maximum possible latency at the benefit of improving throughput and reducing jitter. v2: Rename "first_post_swapbuffer" batches array to a plain throttle_batch[] as the pluralisation was contorting the name and not making it clear as to whether it was the first batch or first_post_swap batch. Not least of which was that not all throttle points are SwapBuffers. Signed-off-by: Chris Wilson <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Kenneth Graunke <[email protected]> Cc: Ben Widawsky <[email protected]> Cc: Kristian Høgsberg <[email protected]> Cc: Chad Versace <[email protected]> Cc: Ian Romanick <[email protected]> Reviewed-by: Chad Versace <[email protected]>