aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa
Commit message (Collapse)AuthorAgeFilesLines
* glsl: add gl_InvocationID variable for ARB_gpu_shader5Jordan Justen2014-02-201-0/+1
| | | | | | | | | v2: * Make gl_InvocationID a system value Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* main/shaderapi: GL_GEOMETRY_SHADER_INVOCATIONS GetProgramiv supportJordan Justen2014-02-201-0/+6
| | | | | | | | | v3: * Add check for ARB_gpu_shader5 Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* mesa: initialize gl_geometry_program Invocations fieldJordan Justen2014-02-205-0/+5
| | | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* glsl/linker: produce gl_shader_program Geom.InvocationsJordan Justen2014-02-201-0/+9
| | | | | | | | | | Grab the parsed invocation count, check for consistency during linking, and finally save the result in gl_shader_program Geom.Invocations. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Fix extra return value after winsys rb update refactor.Eric Anholt2014-02-201-1/+1
| | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75172 Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Use samplers for UBOs in the VS like we do for non-UBO pulls.Eric Anholt2014-02-201-5/+18
| | | | | | | Improves performance of a dolphin emulator trace I had laying around by 3.60131% +/- 0.995887% (n=128). Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Add an optimization pass to remove redundant flags movs.Eric Anholt2014-02-202-0/+34
| | | | | | | | | | | | | | We generate steaming piles of these for the centroid workaround, and this quickly cleans them up. total instructions in shared programs: 1591228 -> 1590047 (-0.07%) instructions in affected programs: 26111 -> 24930 (-4.52%) GAINED: 0 LOST: 0 (Improved apps are l4d2, csgo, and dolphin) Reviewed-by: Matt Turner <[email protected]>
* i965: Actually claim to support MSAA on Broadwell.Kenneth Graunke2014-02-192-1/+10
| | | | | | | | | We need to advertise 8x, 4x, and 2x multisamples. Previously, we only claimed to support 0/1 samples. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Update physical width/height munging for 2x IMS MSAA.Kenneth Graunke2014-02-191-1/+6
| | | | | | | | | | | | | | | | | | I can't find any documentation to explain what ought to be done here, so I simply guessed based on the pattern I observed in the 4x/8x cases. It appears to work, but it could be totally wrong. I was able to find the Sandybridge PRM quote from the comments in the latest documentation: Shared Functions > 3D Sampler > Multisampled Surface Behavior. However, it only mentions 4x MSAA - not even 8x. After a substantial amount more digging, I was able to find a second page (incorrectly tagged) which confirmed the formulas in our code for 8x MSAA. However, that page didn't mention 2x MSAA at all. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Enable smooth points when multisampling without point sprites.Kenneth Graunke2014-02-191-1/+5
| | | | | | | | | | | | | | | | | According to the "Point Multisample Rasterization" of the OpenGL specification (3.0 or later), smooth points are supposed to be enabled implicitly when multisampling, regardless of the GL_POINT_SMOOTH flag. However, if GL_POINT_SPRITE is enabled, you get square points no matter what. Core contexts always enable point sprites, so this effectively makes smooth points go away, even in the case of multisampling. Fixes Piglit's EXT_framebuffer_multisample/point-smooth tests. (Yes, that's right folks, we actually have Piglit tests for this.) Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Thwack multisample enable bit in 3DSTATE_RASTER.Kenneth Graunke2014-02-192-0/+5
| | | | | | | | | | The meaning and effects of this bit are surprisingly complicated. See Rasterization > Windower > Multisampling > Multisample ModesState. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Only use the SIMD16 program for per-sample shading on Broadwell.Kenneth Graunke2014-02-191-9/+32
| | | | | | | | | | | | This restriction carries forward from earlier platforms. The code is ported straight from gen7_wm_state.c. v2: Actually do it right. v3: Add missing _NEW_MULTISAMPLE bit (caught by Eric). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Set "Position XY Offset Select" bits in 3DSTATE_PS on Broadwell.Kenneth Graunke2014-02-191-0/+18
| | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Add missing sample shading bits to Gen8's 3DSTATE_PS_EXTRA.Kenneth Graunke2014-02-191-1/+15
| | | | | | | | | | | | v2: Also set the "oMask Present to Render Target" bit, which is required for shaders that write oMask. Otherwise the hardware won't expect the extra data. v3: Add missing _NEW_MULTISAMPLE (caught by Eric). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965/fs: Implement FS_OPCODE_SET_OMASK on Broadwell.Kenneth Graunke2014-02-192-1/+38
| | | | | | | | | I made a few changes which I think simplify the code a bit compared to the Gen7 implementation, but which are largely pointless. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965/fs: Implement FS_OPCODE_SET_SAMPLE_ID on Broadwell.Kenneth Graunke2014-02-192-1/+32
| | | | | | | | Largely cut and paste from Gen7; it works the same way. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Disable MCS on Broadwell for now.Kenneth Graunke2014-02-191-0/+8
| | | | | | | | v2: Add a perf_debug() message to remind us to come back to this. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Use gen7_surface_msaa_bits in Broadwell SURFACE_STATE code.Kenneth Graunke2014-02-191-14/+2
| | | | | | | | | | | We already set the number of samples, but were missing the MSAA layout mode. Reusing gen7_surface_msaa_bits makes it easy to set both. This also lets us drop the Gen8 surface_num_multisamples function. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Use ffs() for sample counting in gen7_surface_msaa_bits().Kenneth Graunke2014-02-191-6/+4
| | | | | | | | | | | The enumerations are just log2(num_samples) shifted by 3, which we can easily compute via ffs(). This also makes it reusable for Broadwell, which has 2x MSAA. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Simplify Broadwell's 3DSTATE_MULTISAMPLE sample count handling.Kenneth Graunke2014-02-191-23/+3
| | | | | | | | | | | These enumerations are simply log2 of the number of multisamples shifted by a bit, so we can calculate them using ffs() in a lot less code. Suggested by Eric Anholt. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965/fs: Allocate the param_size array dynamically.Francisco Jerez2014-02-192-2/+2
| | | | | | | | Useful because the total number of uniform components might exceed MAX_UNIFORMS * 4 in some cases because of the image metadata we'll be passing as push constants. Reviewed-by: Paul Berry <[email protected]>
* i965/fs: Use a separate variable to keep track of the last uniform index seen.Francisco Jerez2014-02-195-35/+35
| | | | | | | Like the VEC4 back-end does. It will make dynamic allocation of the param_size array easier in a future commit. Reviewed-by: Paul Berry <[email protected]>
* i965: Have brw_imm_vf4() take the vector components as integer values.Francisco Jerez2014-02-192-11/+31
| | | | Reviewed-by: Paul Berry <[email protected]>
* i965: Add helper function to find out the signedness of a register type.Francisco Jerez2014-02-191-0/+28
| | | | Reviewed-by: Paul Berry <[email protected]>
* i965/vec4: Use swizzle() in the ARB_vertex_program code.Francisco Jerez2014-02-192-24/+11
| | | | Reviewed-by: Paul Berry <[email protected]>
* i965/fs: Use offset() in the ARB_fragment_program code.Francisco Jerez2014-02-191-69/+62
| | | | Reviewed-by: Paul Berry <[email protected]>
* i965/fs: Remove fs_reg::retype.Francisco Jerez2014-02-193-20/+12
| | | | | | | | | There doesn't seem to be any reason for it to be a method, and it's surprising that the expression 'reg.retype(t)' doesn't retype its object but rather it creates a temporary with the new type. Use 'retype(reg, t)' instead. Reviewed-by: Paul Berry <[email protected]>
* i965/vec4: Trivial improvements to the with_writemask() function.Francisco Jerez2014-02-193-18/+15
| | | | | | | | | | | | | | Add assertion that the register is not in the HW_REG or IMM file, calculate the conjunction of the old and new mask instead of replacing the old [consistent with the behavior of brw_writemask(), causes no functional changes right now], make it static inline to let the compiler do a slightly better job at optimizing things, and shorten its name. v2: Assert that the new writemask is not zero to avoid undefined hardware behaviour. Reviewed-by: Paul Berry <[email protected]>
* i965: Make sure that backend_reg::type and brw_reg::type are consistent for ↵Francisco Jerez2014-02-195-0/+26
| | | | | | | | | | | | | | | fixed regs. And define non-mutating helper functions to retype fixed and normal regs with a common interface. At some point we may want to get rid of ::fixed_hw_reg completely and have fixed regs use the normal register data members (e.g. backend_reg::reg to select a fixed GRF number, src_reg::swizzle to store the swizzle, etc.), I have the feeling that this is not the last headache we're going to get because of the multiple ways to represent the same thing and the different register interface depending on the file a register is stored in... Reviewed-by: Paul Berry <[email protected]>
* i965/vec4: Add non-mutating helper functions to modify src_reg::swizzle and ↵Francisco Jerez2014-02-191-0/+24
| | | | | | ::negate. Reviewed-by: Paul Berry <[email protected]>
* i965: Add non-mutating helper functions to modify the register offset.Francisco Jerez2014-02-192-0/+24
| | | | | | | Yes, we could avoid having four copies of essentially the same code by using templates here. Reviewed-by: Paul Berry <[email protected]>
* i965/vec4: Fix off-by-one register class overallocation.Francisco Jerez2014-02-191-1/+1
| | | | Reviewed-by: Paul Berry <[email protected]>
* i965: Unify fs_generator:: and vec4_generator::mark_surface_used as a free ↵Francisco Jerez2014-02-197-38/+32
| | | | | | | | function. This way it can be used anywhere. I need it from the visitor. Reviewed-by: Paul Berry <[email protected]>
* i965: Move up duplicated fields from stage-specific prog_data to ↵Francisco Jerez2014-02-1924-188/+162
| | | | | | | | | | | | | brw_stage_prog_data. There doesn't seem to be any reason for nr_params, nr_pull_params, param, and pull_param to be duplicated in the stage-specific subclasses of brw_stage_prog_data. Moving their definition to the common base class will allow some code sharing in a future commit, the removal of brw_vec4_prog_data_compare and brw_*_prog_data_free, and the simplification of the stage-specific brw_*_prog_data_compare. Reviewed-by: Paul Berry <[email protected]>
* i965/vec4: Add constructor of src_reg from a fixed hardware reg.Francisco Jerez2014-02-192-0/+9
| | | | Reviewed-by: Paul Berry <[email protected]>
* i965: Enable fast depth clears.Kenneth Graunke2014-02-191-1/+1
| | | | | | | They work fine now, too. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Enable HiZ on Broadwell.Kenneth Graunke2014-02-191-1/+1
| | | | | | | It appears to work fine. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Implement HiZ resolves on Broadwell.Kenneth Graunke2014-02-193-2/+113
| | | | | | | | | | | | | | | | | | | | | | | Broadwell's 3DSTATE_WM_HZ_OP packet makes this much easier. Instead of programming the whole pipeline, we simply have to emit the depth/stencil packets, a state override, and a pipe control. Then arrange for the state to be put back. This is easily done from a single function. v2: Use minify(mt->logical_{width,height}0, level) in 3DSTATE_WM_HZ_OP instead of intel_mipmap_level's width/height fields. Those were based on the physical width/height, and thus wrong for MSAA buffers. Eric also deleted those fields. v3: Use 0xFFFF as the sample mask regardless of what the user set (as this operation is unrelated); set the drawing rectangle to the miplevel being operated on, rather than the whole surface; remove unnecessary MAX2(..., 1) around mt->logical_depth0 (all suggested by Eric Anholt). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Refactor Gen8 depth packet emission.Kenneth Graunke2014-02-191-72/+99
| | | | | | | | | | | | | | | | | The existing code followed the vtable function signature, which is not a great fit: many of the parameters are unused, and the function still inspects global state, making it less reusable. This patch refactors the depth buffer packet emission code into a new function which takes exactly the parameters it needs, and which uses no global state. It then makes the existing vtable function call the new one. Ideally, we would remove the vtable function, and clean up that interface. But that can happen once HiZ is working. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Add #defines for the 3DSTATE_WM_HZ_OP packet's contents.Kenneth Graunke2014-02-191-0/+25
| | | | | | | We're going to need these to implement HiZ. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Bump generation check in code to disable HiZ at LODs > 0.Kenneth Graunke2014-02-191-1/+1
| | | | | | | | | Broadwell's "HiZ Resolve" operation still has the restriction that the rectangle primitive must be 8x4 aligned. So I believe we still need this. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Program 3DSTATE_HIER_DEPTH_BUFFER properly on Broadwell.Kenneth Graunke2014-02-191-8/+17
| | | | | | | HiZ buffers still don't exist, but when they do, we'll set them up. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Pull format conversion logic out of brw_depthbuffer_format.Kenneth Graunke2014-02-193-32/+43
| | | | | | | | | | | | brw_depthbuffer_format is not very reusable at the moment, since it uses global state (ctx->DrawBuffer) to access a particular depth buffer. For HiZ on Broadwell, I need a function which simply converts the formats. However, at least one existing user of brw_depthbuffer_format really wants the existing interface. So, I've created a new function. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Bump MaxTexMbytes from 1GB to 1.5GB.Kenneth Graunke2014-02-181-0/+1
| | | | | | | | | | | | | | | | | Even with the other limits raised, TestProxyTexImage would still reject textures > 1GB in size. This is an artificial limit; nothing prevents us from having a larger texture. I stayed shy of 2GB to avoid the larger-than-aperture situation. For 3D textures, this raises the effective limit: - RGBA8: 645 -> 738 - RGBA16: 512 -> 586 - RGBA32F: 406 -> 465 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74130 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Bump GL_MAX_CUBE_MAP_TEXTURE_SIZE to 8192.Kenneth Graunke2014-02-181-1/+1
| | | | | | | | | | | | | | Gen4+ supports 8192x8192 cube maps. Ivybridge and later can actually support 16384, but that would place GL_MAX_CUBE_MAP_TEXTURE_SIZE above GL_MAX_TEXTURE_SIZE, which seems like a bad idea. (Unfortunately, we can't bump GL_MAX_TEXTURE_SIZE to 16384 without causing regressions due to awful W-tiled stencil buffer interactions.) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74130 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Bump MAX_3D_TEXTURE_SIZE to 2048.Kenneth Graunke2014-02-181-1/+1
| | | | | | | | | | | | It's highly unlikely that there will be enough memory in the system to allocate enough space for this, but we should still expose the hardware limit. It's what the Intel Windows driver does, and it seems most other vendors do likewise. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74130 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* mesa: Add GL_TEXTURE_CUBE_MAP_ARRAY to legal_get_tex_level_parameter_target()Anuj Phogat2014-02-181-0/+3
| | | | | | | | Fixes failing Khronos CTS test packed_depth_stencil_init.test Cc: <[email protected]> Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/fs: Use conditional sends to do FB writes on HSW+.Eric Anholt2014-02-184-18/+46
| | | | | | | | | | | | | | | | | | | | | | | | | This drops the MOVs for header setup, which are totally mis-scheduled. total instructions in shared programs: 1590047 -> 1589331 (-0.05%) instructions in affected programs: 43729 -> 43013 (-1.64%) GAINED: 0 LOST: 0 glb27-trex: x before + after +-----------------------------------------------------------------------------+ | + x xx + + + | | ++ + xxx ++x xx + ** *x+ + + + x * | |+x xx x* x+++xx*x*xx+++*+*xx++** *x* x+***x*+xx+* + * + + *| | |__|__________MA___A___________|___| | +-----------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 49 62.33 65.41 63.49 63.53449 0.62757822 + 50 62.28 65.4 63.7 63.6982 0.656564 No difference proven at 95.0% confidence Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Drop dead comment about the old proj_attrib_mask optimization.Eric Anholt2014-02-181-6/+0
| | | | | | The code was removed early last year. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Drop mt->levels[].width/height.Eric Anholt2014-02-187-42/+23
| | | | | | | | | | | | It often confused people because it was unclear on whether it was the physical or logical, and people needed the other one as well. We can recompute it trivially using the minify() macro, clarifying which value is being used and making getting the other value obvious. v2: Fix a pasteo in intel_blit.c's dst flip. Reviewed-by: Chris Forbes <[email protected]> (v1) Reviewed-by: Kenneth Graunke <[email protected]>