summaryrefslogtreecommitdiffstats
path: root/src/mesa
Commit message (Collapse)AuthorAgeFilesLines
* mesa/sso: Add extension tracking for ARB_separate_shader_objectsIan Romanick2014-02-212-0/+2
| | | | | | | This adds the necessary bits for both the API and the GLSL compiler. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* mesa: Refactor per-stage link check to its own functionIan Romanick2014-02-211-68/+34
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Stop throwing away our double precision for time calculations.Eric Anholt2014-02-213-4/+4
| | | | | | Fixes negative times being reported in our perf debug. Reviewed-by: Kenneth Graunke <[email protected]>
* meta: Add support for integer blits.Eric Anholt2014-02-212-7/+71
| | | | | | | | Compared to i965, the code generated doesn't use the AVG instruction. But I'm not sure that multisampled integer resolves are really that important to worry about. Reviewed-by: Kenneth Graunke <[email protected]>
* meta: Add support for doing MSAA to MSAA blits.Eric Anholt2014-02-212-49/+104
| | | | | | | These are non-stretched, non-resolving blits, so it's just a matter of sampling once from our gl_SampleID and storing that to our color/depth. Reviewed-by: Kenneth Graunke <[email protected]>
* meta: Save and restore a bunch of MSAA state.Eric Anholt2014-02-212-4/+38
| | | | | | | | | | We're disabling GL_MULTISAMPLE, so we didn't need to worry about a lot of that state. But to do MSAA to MSAA blits, we need to start handling more state. v2: Fix pasteo caught by Kenneth. Reviewed-by: Kenneth Graunke <[email protected]>
* meta: Try to do blending of sRGB values in linear colorspace.Eric Anholt2014-02-211-5/+25
| | | | | | | Blending of values would occur when doing GL_LINEAR filtering with scaling, and in an upcoming commit when doing MSAA resolves. Reviewed-by: Kenneth Graunke <[email protected]>
* meta: Add support for doing multisample resolves.Eric Anholt2014-02-212-12/+197
| | | | | | | | | Note that this doesn't handle GL_EXT_multisample_scaled_blit yet. The i965 code for that extension bakes in knowledge of the sample positions (well, knowledge of the sample positions aligned to a lower-resolution grid), which we would have to do at runtime somehow for meta. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Fix miptree matching for multisampled, non-interleaved miptrees.Eric Anholt2014-02-212-1/+16
| | | | | | | | We haven't been executing this code before the meta-blit case, because we've been flagging the miptree as validated at texstorage time, and never having to revalidate. Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Remove unnecessary condition.Courtney Goeltzenleuchter2014-02-211-2/+1
| | | | | | | | | Identified by Valgrind memory check. Initialized block-opaque in a different patch. This test seems unnecessary. If opaque must be true, just set to true. Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Courtney Goeltzenleuchter <[email protected]>
* i965/fs: Implement FS_OPCODE_[UN]PACK_HALF_2x16_SPLIT[_XY] opcodes.Kenneth Graunke2014-02-202-2/+81
| | | | | | | | | | | | I'd neglected to port these to Broadwell. Most of this code is copy and pasted from Gen7, but instead of using F32TO16/F16TO32, we just use MOV with HF register types. Fixes fs-packHalf2x16 and fs-unpackHalf2x16 tests (both the ARB extension and ES 3.0 variants). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Drop bogus F32TO16/F16TO32 instructions on Broadwell - use MOV.Kenneth Graunke2014-02-203-6/+6
| | | | | | | | | | | | | | | Broadwell removed the F32TO16 and F16TO32 instructions. However, it has actual support for HF values, so they're actually just MOV. Fixes vs-packHalf2x16 and vs-unpackHalf2x16 tests (both the ARB extension and ES 3.0 variants). v2: Emulate F32TO16's align16 zeroing bug, since Chad's front end code relies on it happening. We can probably refactor this code to be better later. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Create a hardware context before initializing state module.Kenneth Graunke2014-02-201-6/+6
| | | | | | | | | | | | | | | | | | | | | brw_init_state() calls brw_upload_initial_gpu_state(). If hardware contexts are enabled (brw->hw_ctx != NULL), this will upload some initial invariant state for the GPU. Without hardware contexts, we rely on this state being uploaded via atoms that subscribe to the BRW_NEW_CONTEXT bit. Commit 46d3c2bf4ddd227193b98861f1e632498fe547d8 accidentally moved the call to brw_init_state() before creating a hardware context. This meant brw_upload_initial_gpu_state would always early return. Except on Gen6+, we stopped uploading the initial GPU state via state atoms, so it never happened. Fixes a regression since 46d3c2bf4ddd227193b98861f1e632498fe547d8. Cc: "10.0 10.1" <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Implement scratch read/write support for Broadwell.Kenneth Graunke2014-02-201-9/+109
| | | | | | | | | | | | | | To make sure that both the Gen4 and Gen7 style messages work, I initially disabled the SHADER_OPCODE_GEN7_SCRATCH_READ optimization, ran Piglit, re-enabled it, and ran Piglit again. Both worked fine. Fixes 40 Piglit tests (most of the varying-packing category). v2: Move num_regs assertion from gen8_fs_generator to gen8_set_dp_scratch_message() (suggested by Eric). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Add Gen8 assembly support for DP Scratch messages.Kenneth Graunke2014-02-202-0/+47
| | | | | | | | | | The new accessors will make it easy to do Gen7-style scratch messages. v2: Move num_regs assertion from gen8_fs_generator into gen8_set_dp_scratch_message() (suggested by Eric). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Store absolute thread count in max_wm_threads on Broadwell.Kenneth Graunke2014-02-202-2/+5
| | | | | | | | | | | | | | | | | | In the past, 3DSTATE_PS took an absolute number of threads. Conversely, on Broadwell you always program 64, and it implicitly scales based on the GT-level with no special programming. So, I stored 64 in brw_device_info::max_wm_threads. However, I didn't realize that we also use max_wm_threads to compute the size of the scratch space buffer. In that case, we really need the absolute number of threads. This patch hardcodes 3DSTATE_PS to use the value it expects, and changes max_wm_threads back to a (completely fake) absolute thread count (once again copied from Haswell). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Use MOV, not OR for setting URB write channel enables on Gen8+.Kenneth Graunke2014-02-201-5/+2
| | | | | | | | | | | | On Broadwell, g0.5 contains the "Scratch Space Pointer"; using OR puts some bits of that into "ignored" sections of our message header. While this doesn't hurt, it's also not terribly /useful/. Using MOV is sufficient to set the only interesting bits in this part of the message header. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Implement a CS stall workaround on Broadwell.Kenneth Graunke2014-02-201-0/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to the latest documentation, any PIPE_CONTROL with the "Command Streamer Stall" bit set must also have another bit set, with five different options: - Render Target Cache Flush - Depth Cache Flush - Stall at Pixel Scoreboard - Post-Sync Operation - Depth Stall I chose "Stall at Pixel Scoreboard" since we've used it effectively in the past, but the choice is fairly arbitrary. Implementing this in the PIPE_CONTROL emit helpers ensures that the workaround will always take effect when it ought to. Apparently, this workaround may be necessary on older hardware as well; for now I've only added it to Broadwell as it's absolutely necessary there. Subsequent patches could add it to older platforms, provided someone tests it there. v2: Only flag "Stall at Pixel Scoreboard" when none of the other bits are set (suggested by Ian Romanick). v3: Prefix the function with "gen8" (requested by Eric). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> (v2) Reviewed-by: Eric Anholt <[email protected]>
* i965: support instanced GS on gen7Jordan Justen2014-02-205-2/+11
| | | | | | | | | | v3: * Properly prevent dual object mode execution when the invocation count > 1 Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: support gl_InvocationID for gen7Jordan Justen2014-02-205-3/+49
| | | | | | | | | | | | | v2: * Make gl_InvocationID a system value v3: * Properly shift from R0.1 into DST.4 by adding GS_OPCODE_GET_INSTANCE_ID Signed-off-by: Jordan Justen <[email protected]> Acked-by: Paul Berry <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* glsl: add gl_InvocationID variable for ARB_gpu_shader5Jordan Justen2014-02-201-0/+1
| | | | | | | | | v2: * Make gl_InvocationID a system value Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* main/shaderapi: GL_GEOMETRY_SHADER_INVOCATIONS GetProgramiv supportJordan Justen2014-02-201-0/+6
| | | | | | | | | v3: * Add check for ARB_gpu_shader5 Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* mesa: initialize gl_geometry_program Invocations fieldJordan Justen2014-02-205-0/+5
| | | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* glsl/linker: produce gl_shader_program Geom.InvocationsJordan Justen2014-02-201-0/+9
| | | | | | | | | | Grab the parsed invocation count, check for consistency during linking, and finally save the result in gl_shader_program Geom.Invocations. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Fix extra return value after winsys rb update refactor.Eric Anholt2014-02-201-1/+1
| | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75172 Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Use samplers for UBOs in the VS like we do for non-UBO pulls.Eric Anholt2014-02-201-5/+18
| | | | | | | Improves performance of a dolphin emulator trace I had laying around by 3.60131% +/- 0.995887% (n=128). Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Add an optimization pass to remove redundant flags movs.Eric Anholt2014-02-202-0/+34
| | | | | | | | | | | | | | We generate steaming piles of these for the centroid workaround, and this quickly cleans them up. total instructions in shared programs: 1591228 -> 1590047 (-0.07%) instructions in affected programs: 26111 -> 24930 (-4.52%) GAINED: 0 LOST: 0 (Improved apps are l4d2, csgo, and dolphin) Reviewed-by: Matt Turner <[email protected]>
* i965: Actually claim to support MSAA on Broadwell.Kenneth Graunke2014-02-192-1/+10
| | | | | | | | | We need to advertise 8x, 4x, and 2x multisamples. Previously, we only claimed to support 0/1 samples. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Update physical width/height munging for 2x IMS MSAA.Kenneth Graunke2014-02-191-1/+6
| | | | | | | | | | | | | | | | | | I can't find any documentation to explain what ought to be done here, so I simply guessed based on the pattern I observed in the 4x/8x cases. It appears to work, but it could be totally wrong. I was able to find the Sandybridge PRM quote from the comments in the latest documentation: Shared Functions > 3D Sampler > Multisampled Surface Behavior. However, it only mentions 4x MSAA - not even 8x. After a substantial amount more digging, I was able to find a second page (incorrectly tagged) which confirmed the formulas in our code for 8x MSAA. However, that page didn't mention 2x MSAA at all. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Enable smooth points when multisampling without point sprites.Kenneth Graunke2014-02-191-1/+5
| | | | | | | | | | | | | | | | | According to the "Point Multisample Rasterization" of the OpenGL specification (3.0 or later), smooth points are supposed to be enabled implicitly when multisampling, regardless of the GL_POINT_SMOOTH flag. However, if GL_POINT_SPRITE is enabled, you get square points no matter what. Core contexts always enable point sprites, so this effectively makes smooth points go away, even in the case of multisampling. Fixes Piglit's EXT_framebuffer_multisample/point-smooth tests. (Yes, that's right folks, we actually have Piglit tests for this.) Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Thwack multisample enable bit in 3DSTATE_RASTER.Kenneth Graunke2014-02-192-0/+5
| | | | | | | | | | The meaning and effects of this bit are surprisingly complicated. See Rasterization > Windower > Multisampling > Multisample ModesState. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Only use the SIMD16 program for per-sample shading on Broadwell.Kenneth Graunke2014-02-191-9/+32
| | | | | | | | | | | | This restriction carries forward from earlier platforms. The code is ported straight from gen7_wm_state.c. v2: Actually do it right. v3: Add missing _NEW_MULTISAMPLE bit (caught by Eric). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Set "Position XY Offset Select" bits in 3DSTATE_PS on Broadwell.Kenneth Graunke2014-02-191-0/+18
| | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Add missing sample shading bits to Gen8's 3DSTATE_PS_EXTRA.Kenneth Graunke2014-02-191-1/+15
| | | | | | | | | | | | v2: Also set the "oMask Present to Render Target" bit, which is required for shaders that write oMask. Otherwise the hardware won't expect the extra data. v3: Add missing _NEW_MULTISAMPLE (caught by Eric). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965/fs: Implement FS_OPCODE_SET_OMASK on Broadwell.Kenneth Graunke2014-02-192-1/+38
| | | | | | | | | I made a few changes which I think simplify the code a bit compared to the Gen7 implementation, but which are largely pointless. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965/fs: Implement FS_OPCODE_SET_SAMPLE_ID on Broadwell.Kenneth Graunke2014-02-192-1/+32
| | | | | | | | Largely cut and paste from Gen7; it works the same way. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Disable MCS on Broadwell for now.Kenneth Graunke2014-02-191-0/+8
| | | | | | | | v2: Add a perf_debug() message to remind us to come back to this. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Use gen7_surface_msaa_bits in Broadwell SURFACE_STATE code.Kenneth Graunke2014-02-191-14/+2
| | | | | | | | | | | We already set the number of samples, but were missing the MSAA layout mode. Reusing gen7_surface_msaa_bits makes it easy to set both. This also lets us drop the Gen8 surface_num_multisamples function. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Use ffs() for sample counting in gen7_surface_msaa_bits().Kenneth Graunke2014-02-191-6/+4
| | | | | | | | | | | The enumerations are just log2(num_samples) shifted by 3, which we can easily compute via ffs(). This also makes it reusable for Broadwell, which has 2x MSAA. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Simplify Broadwell's 3DSTATE_MULTISAMPLE sample count handling.Kenneth Graunke2014-02-191-23/+3
| | | | | | | | | | | These enumerations are simply log2 of the number of multisamples shifted by a bit, so we can calculate them using ffs() in a lot less code. Suggested by Eric Anholt. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965/fs: Allocate the param_size array dynamically.Francisco Jerez2014-02-192-2/+2
| | | | | | | | Useful because the total number of uniform components might exceed MAX_UNIFORMS * 4 in some cases because of the image metadata we'll be passing as push constants. Reviewed-by: Paul Berry <[email protected]>
* i965/fs: Use a separate variable to keep track of the last uniform index seen.Francisco Jerez2014-02-195-35/+35
| | | | | | | Like the VEC4 back-end does. It will make dynamic allocation of the param_size array easier in a future commit. Reviewed-by: Paul Berry <[email protected]>
* i965: Have brw_imm_vf4() take the vector components as integer values.Francisco Jerez2014-02-192-11/+31
| | | | Reviewed-by: Paul Berry <[email protected]>
* i965: Add helper function to find out the signedness of a register type.Francisco Jerez2014-02-191-0/+28
| | | | Reviewed-by: Paul Berry <[email protected]>
* i965/vec4: Use swizzle() in the ARB_vertex_program code.Francisco Jerez2014-02-192-24/+11
| | | | Reviewed-by: Paul Berry <[email protected]>
* i965/fs: Use offset() in the ARB_fragment_program code.Francisco Jerez2014-02-191-69/+62
| | | | Reviewed-by: Paul Berry <[email protected]>
* i965/fs: Remove fs_reg::retype.Francisco Jerez2014-02-193-20/+12
| | | | | | | | | There doesn't seem to be any reason for it to be a method, and it's surprising that the expression 'reg.retype(t)' doesn't retype its object but rather it creates a temporary with the new type. Use 'retype(reg, t)' instead. Reviewed-by: Paul Berry <[email protected]>
* i965/vec4: Trivial improvements to the with_writemask() function.Francisco Jerez2014-02-193-18/+15
| | | | | | | | | | | | | | Add assertion that the register is not in the HW_REG or IMM file, calculate the conjunction of the old and new mask instead of replacing the old [consistent with the behavior of brw_writemask(), causes no functional changes right now], make it static inline to let the compiler do a slightly better job at optimizing things, and shorten its name. v2: Assert that the new writemask is not zero to avoid undefined hardware behaviour. Reviewed-by: Paul Berry <[email protected]>
* i965: Make sure that backend_reg::type and brw_reg::type are consistent for ↵Francisco Jerez2014-02-195-0/+26
| | | | | | | | | | | | | | | fixed regs. And define non-mutating helper functions to retype fixed and normal regs with a common interface. At some point we may want to get rid of ::fixed_hw_reg completely and have fixed regs use the normal register data members (e.g. backend_reg::reg to select a fixed GRF number, src_reg::swizzle to store the swizzle, etc.), I have the feeling that this is not the last headache we're going to get because of the multiple ways to represent the same thing and the different register interface depending on the file a register is stored in... Reviewed-by: Paul Berry <[email protected]>
* i965/vec4: Add non-mutating helper functions to modify src_reg::swizzle and ↵Francisco Jerez2014-02-191-0/+24
| | | | | | ::negate. Reviewed-by: Paul Berry <[email protected]>