summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* i965: Don't do the temporary-and-blit-copy for INVALIDATE_RANGE maps.Eric Anholt2014-01-091-1/+2
| | | | | | | | | | | | | | | | We definitely want to fall through to the unsynchronized map case, instead of wasting bandwidth on a copy. Prevents a -43.2407% +/- 1.06113% (n=49) performance regression on aa10perf when teaching glamor to provide the GL_INVALIDATE_RANGE_BIT information. This is a performance fix, which I usually wouldn't cherry-pick to stable. But this was really was just a bug in the code, its presence would discourage developers from giving us the best information they can, and I think we've got fairly high confidence in the unsynchronized map path already. Cc: 10.0 9.2 <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Fix handling of MESA_pack_invert in blit (PBO) readpixels.Eric Anholt2014-01-091-1/+3
| | | | | | | | | | Fixes piglit GL_MESA_pack_invert/readpixels and GPU hangs with glamor and cairo-gl. Cc: 10.0 9.2 <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Fix incorrect bounds tracking for blit readpixels's GPU access.Eric Anholt2014-01-091-2/+1
| | | | | | | | | | | While incorrect, it probably wouldn't affect anyone ever: You'd have to do an appropriately-formatted readpixels into a PBO, then overwrite the tail end of the updated area of the PBO with glBufferSubData(), and you wouldn't get appropriate synchronization. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Use SET_FIELD to safety check our x/y offsets in blits.Eric Anholt2014-01-092-7/+14
| | | | | | | | | | | | | | The earlier assert made sure that our math didn't exceed our bounds, but this makes sure that we don't overflow from the high bits X into the low bits of Y. We've already put checks in intel_miptree_blit(), but I've wanted to expand the type in our protoype from short to uint32_t, and we could get in trouble with intel_emit_linear_blit() if we did. v2: Add Ken's comment about the funny language extension used. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> (v1) Reviewed-by: Anuj Phogat <[email protected]> (v1)
* i965: Add an assert for when SET_FIELD's value exceeds the field size.Eric Anholt2014-01-091-1/+7
| | | | | | | | | This was one of the things we always wanted to do to this, to make it more useful than just (value << FIELD_MASK). Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Add a safety check for emitting blits.Eric Anholt2014-01-091-0/+4
| | | | | | | | | | | | With all of the flipping and pitch twiddling and miptree layout involved in our blits, there are lots of ways for us to scribble outside of a buffer. Put in a check that we're not about to do so. This catches a bug that glamor was running into. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Don't call the blitter on addresses it can't handle.Eric Anholt2014-01-092-3/+40
| | | | | | | | | Noticed by tex3d-maxsize on my next commit to check that our addresses don't overflow. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* mesa: Namespace qualify fma to override ambiguity with fma from math.hThomas Sondergaard2014-01-081-1/+1
| | | | | | | MSVC 2013 version of math.h includes an fma() function. Cc: "10.0" <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* mesa: Work around internal compiler errorThomas Sondergaard2014-01-081-2/+2
| | | | | | | | | This small rearrangement avoids MSVC 2013 ICE. Also, this should be a better memory access order. Cc: "10.0" <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* mesa: Fix compile error with MSVC 2013Thomas Sondergaard2014-01-081-1/+1
| | | | | | | | | This fixes the following compile error: src\glsl\ir_constant_expression.cpp(1405) : error C2666: 'copysign' : 3 overloads have similar conversions Cc: "10.0" <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* freedreno: add basic query supportRob Clark2014-01-088-1/+275
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add for now some simple/basic query support (ie. things not actually requiring the GPU). Might change around a bit when I actually add GPU queries, but for now this enables some useful performance info in the GALLIUM_HUD. For example: GALLIUM_HUD=fps+batches+batches-sysmem+batches-gmem+restores,draw-calls The driver specific specific queries are: + draw-calls + batches - number of batches per second, sum of batches-sysmem plus batches-gmem + batches-gmem - render a set of tiles in GMEM, for each tile (optionally) system mem -> gmem (restore), plus N draws, plus gmem -> system mem (resolve) per second + batches-sysmem - N draws to system memory (GMEM bypass) per second + restores - number of GMEM batches that required restore per second Ideally for GMEM rendering, you want batches-gmem to equal fps. If the app is doing something that triggers multiple passes (ie. requires extra round trip gmem <-> system memory) then the # of batches per second will go up relative to fps. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: use cs patch instead of RFI+RMWRob Clark2014-01-088-52/+46
| | | | | | | | Since we now have the cmdstream patch mechanism needed for hw binning, might as well also use it for RB_RENDER_CONTROL updates. This avoids the need to use RMW (and associated WFI) to update RB_RENDER_CONTROL. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: support for hw binning passRob Clark2014-01-0815-158/+706
| | | | | | | | | | | | | | | | | | | | | | | The binning pass sorts vertices into which bins/tiles they apply to. The visibility information generated during the binning pass can be used to speed up the rendering pass by filtering out vertices which do not apply to the current tile. See: https://github.com/freedreno/freedreno/wiki/Adreno-tiling#optimized-approach This brings a significant fps boost. A rough assortment of tests (supertuxkart, etracer, tremulous, glmark2 'build' test, etc) seems to yield a ~35-45% fps improvement. For now, to be conservative, the binning pass is not enabled yet by default. To enable it use: FD_MESA_DEBUG=binning So far I haven't found anything that breaks with binning enabled, but I'd like a bit more testing before I enable it as default. Signed-off-by: Rob Clark <[email protected]>
* freedreno: be more clever about gmem usageRob Clark2014-01-082-9/+18
| | | | | | Only need to leave room for depth/stencil if it is actually used, etc. Signed-off-by: Rob Clark <[email protected]>
* freedreno: resync generated headersRob Clark2014-01-085-24/+214
| | | | Signed-off-by: Rob Clark <[email protected]>
* i965: fold offset into coord for textureOffset(gsampler2DRect)Chris Forbes2014-01-091-1/+1
| | | | | | | | | | | | | | | | | | | | | The hardware is broken with nonzero texel offsets and unnormalized coordinates; instead of doing correct offsetting, we get garbage. This just extends the existing workaround for ir_txf and ir_tg4+gsampler2DRect to also consider ir_tex+gsampler2DRect. Fixes broken rendering in 'tesseract' when 'mesa_texrectoffset_bug' is not enabled; also fixes the new piglit test 'tests/spec/glsl-1.30/execution/fs-textureOffset-Rect'. Has been broken ~forever; suggesting including this in only 10.0 because the lowering pass doesn't exist in 9.2 or earlier so would require quite a different patch. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: Lee Salzman <[email protected]> Cc: "10.0" <[email protected]>
* mesa: Remove _mesa_progshader_enum_to_string(), which is no longer used.Paul Berry2014-01-084-34/+2
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* glsl: Make more use of gl_shader_stage enum in ir_set_program_inouts.cpp.Paul Berry2014-01-085-18/+19
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* glsl: Make more use of gl_shader_stage enum in lower_clip_distance.cpp.Paul Berry2014-01-081-8/+8
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* glsl: Make more use of gl_shader_stage enum in link_varyings.cpp.Paul Berry2014-01-081-24/+24
| | | | | | | | Reviewed-by: Kenneth Graunke <[email protected]> v2: Also rename "shaderType" param of is_varying_var() to "stage". Reviewed-by: Brian Paul <[email protected]>
* glsl: Change _mesa_glsl_parse_state ctor to use gl_shader_stage enum.Paul Berry2014-01-086-12/+10
| | | | | | | | Reviewed-by: Kenneth Graunke <[email protected]> v2: Also rename "target" param to "stage". Reviewed-by: Brian Paul <[email protected]>
* mesa: Use gl_shader::Stage instead of gl_shader::Type where possible.Paul Berry2014-01-0811-46/+51
| | | | | | | | | | | | | | | | | | | | | This reduces confusion since gl_shader::Type is sometimes GL_SHADER_PROGRAM_MESA but is more frequently GL_SHADER_{VERTEX,GEOMETRY,FRAGMENT}. It also has the advantage that when switching on gl_shader::Stage, the compiler will alert if one of the possible enum types is unhandled. Finally, many functions in src/glsl (especially those dealing with linking) already use gl_shader_stage to represent pipeline stages; using gl_shader::Stage in those functions avoids the need for a conversion. Note: in the process I changed _mesa_write_shader_to_file() so that if it encounters an unexpected shader stage, it will use a file suffix of "????" rather than "geom". Reviewed-by: Brian Paul <[email protected]> v2: Split from patch "mesa: Store gl_shader_stage enum in gl_shader objects." Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Store gl_shader_stage enum in gl_shader objects.Paul Berry2014-01-088-0/+8
| | | | | Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Move declaration of gl_shader_stage earlier in mtypes.h.Paul Berry2014-01-081-17/+17
| | | | | | | | | | | Also move the related #define MESA_SHADER_STAGES. This will allow gl_shader_stage to be used in struct gl_shader. Reviewed-by: Brian Paul <[email protected]> v2: Split from patch "mesa: Store gl_shader_stage enum in gl_shader objects." Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: make _mesa_shader_stage_to_string() available to non-C++ code.Paul Berry2014-01-081-8/+7
| | | | | | | | Reviewed-by: Brian Paul <[email protected]> v2: Split from patch "mesa: Store gl_shader_stage enum in gl_shader objects." Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Clean up nomenclature for pipeline stages.Paul Berry2014-01-0832-203/+203
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, we had an enum called gl_shader_type which represented pipeline stages in the order they occur in the pipeline (i.e. MESA_SHADER_VERTEX=0, MESA_SHADER_GEOMETRY=1, etc), and several inconsistently named functions for converting between it and other representations: - _mesa_shader_type_to_string: gl_shader_type -> string - _mesa_shader_type_to_index: GLenum (GL_*_SHADER) -> gl_shader_type - _mesa_program_target_to_index: GLenum (GL_*_PROGRAM) -> gl_shader_type - _mesa_shader_enum_to_string: GLenum (GL_*_{SHADER,PROGRAM}) -> string This patch tries to clean things up so that we use more consistent terminology: the enum is now called gl_shader_stage (to emphasize that it is in the order of pipeline stages), and the conversion functions are: - _mesa_shader_stage_to_string: gl_shader_stage -> string - _mesa_shader_enum_to_shader_stage: GLenum (GL_*_SHADER) -> gl_shader_stage - _mesa_program_enum_to_shader_stage: GLenum (GL_*_PROGRAM) -> gl_shader_stage - _mesa_progshader_enum_to_string: GLenum (GL_*_{SHADER,PROGRAM}) -> string In addition, MESA_SHADER_TYPES has been renamed to MESA_SHADER_STAGES, for consistency with the new name for the enum. Reviewed-by: Kenneth Graunke <[email protected]> v2: Also rename the "target" field of _mesa_glsl_parse_state and the "target" parameter of _mesa_shader_stage_to_string to "stage". Reviewed-by: Brian Paul <[email protected]>
* llvmpipe: Fix the bottom_edge_rule adjustment for points.José Fonseca2014-01-081-4/+4
| | | | | | | | | The adjustment needs to be applied to the y coordinates and not the x coordinates, just like the equivalent code for lines and triangles in lp_setup_line.c and lp_setup_tri.c. Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Zack Rusin <[email protected]>
* llvmpipe: Respect bottom_edge_rule when computing the rasterization bounding ↵José Fonseca2014-01-083-3/+3
| | | | | | | | | | | | | | boxes. This was inadvertently forgotten when replacing gl_rasterization_rules with lower_left_origin and half_pixel_center (commit 2737abb44efebfa10ac84b183c20fc5818d1514e). This makes a difference when lower_left_origin != half_pixel_center, e.g, D3D10. Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Zack Rusin <[email protected]>
* ilo: enable HiZChia-I Wu2014-01-084-7/+45
| | | | | | The support is still early. Fast depth buffer clear is not enabled yet. HiZ can be forced off with ILO_DEBUG=nohiz.
* ilo: resolve Z/HiZ correctlyChia-I Wu2014-01-085-1/+234
| | | | | | When the depth buffer is to be read, perform a Depth Buffer Resolve if it has been rendered. When the depth buffer is to be rendered, perform a HiZ Buffer Resolve when the depth buffer is modified externally.
* ilo: add flags to texture slicesChia-I Wu2014-01-081-0/+29
| | | | | The flags are used to mark who (CPU, BLT, or RENDER) has accessed the resource and how (READ or WRITE).
* ilo: rename and add an accessor for texture slicesChia-I Wu2014-01-084-19/+41
| | | | | Rename ilo_texture::slice_offsets to ilo_texture::slices and add an accessor, ilo_texture_get_slice().
* ilo: add HiZ op support to the pipelinesChia-I Wu2014-01-0811-4/+1070
| | | | | | Add blitter functions to perform Depth Buffer Clear, Depth Buffer Resolve, and Hierarchical Depth Buffer Resolve. Those functions set ilo_blitter up and pass it to the pipelines to emit the commands.
* ilo: add support for HiZ allocationChia-I Wu2014-01-082-1/+82
| | | | Add tex_create_hiz() to create HiZ bo. It is not really called yet.
* ilo: refactor separate stencil allocationChia-I Wu2014-01-081-20/+27
| | | | | Move separate stencil allocation code to tex_create_separate_stencil to keep tex_create sane.
* ilo: assorted GPE fixes for HiZChia-I Wu2014-01-085-69/+67
| | | | | | | Allow HiZ op to be specified in 3DSTATE_WM. Pass depth format directly in gen7_emit_3DSTATE_SF. Use tex->hiz.bo to determine if HiZ exists. Fix 3DSTATE_SF for the case when there is no ilo_rasterizer_state. Fix 3DSTATE_PS for the case when there is no ilo_shader_state.
* ilo: no layer offsetting on GEN7+Chia-I Wu2014-01-081-1/+5
| | | | | Even though the Ivy Bridge PRM lists some restrictions that require layer offsetting as the Sandy Bridge PRM does, it seems they are actually lifted.
* ilo: offset to layers only when necessaryChia-I Wu2014-01-084-20/+137
| | | | | | | GEN6 has several requirements regarding the LOD/Depth/Width/Height of the render targets and the depth buffer. We used to offset to the layers in question unconditionally to meet the requirements. With this commit, offseting is done only when the requirements are not met.
* ilo: allow ilo_zs_surface to skip layer offsettingChia-I Wu2014-01-083-19/+18
| | | | Make offset to layer optional in ilo_gpe_init_zs_surface.
* ilo: allow ilo_view_surface to skip layer offsettingChia-I Wu2014-01-084-88/+72
| | | | | Make offset to layer optional in ilo_gpe_init_view_surface_for_texture. render_cache_rw is always the same as is_rt and is replaced.
* i965/fs: do SEL optimization only when src type for MOV matchesTapani Pälli2014-01-081-0/+6
| | | | | | | | | Fixes a bug where then branch operates with ivec4 while else uses vec4. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72379 Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Optimize pow(2, x) --> exp2(x).Kenneth Graunke2014-01-071-0/+11
| | | | | | | | | | | | | | | | | On Haswell, POW takes 24 cycles, while EXP2 only takes 14. Plus, using POW requires putting 2.0 in a register, while EXP2 doesn't. I believe that EXP2 will be faster than POW on basically all GPUs, so it makes sense to optimize it. Looking at the savage2 subset of shader-db: total instructions in shared programs: 113225 -> 113179 (-0.04%) instructions in affected programs: 2139 -> 2093 (-2.15%) instances of 'math pow': 795 -> 749 (-6.14%) instances of 'math exp': 389 -> 435 (11.8%) Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Refactor is_zero/one/negative_one into an is_value() method.Kenneth Graunke2014-01-072-68/+23
| | | | | | | | | | | | | | | This patch creates a new generic is_value() method, which checks if an ir_constant has a particular value. (For vectors, it must have the single value repeated across all components.) It then rewrites the is_zero/is_one/is_negative_one methods to use this generic helper. All three were basically identical except for the value they checked for. The other difference is that is_negative_one rejects boolean types. The new is_value function maintains this behavior, only allowing boolean types when checking for 0 or 1. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Optimize pow(1.0, X) --> 1.0.Kenneth Graunke2014-01-071-0/+6
| | | | | | | Surprisingly, this helps one vertex shader in 3DMMES. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* mesa: Use get_local_param_pointer in glProgramLocalParameters4fvEXT().Kenneth Graunke2014-01-071-19/+11
| | | | | | | | | | | | | | | | | | | Using the get_local_param_pointer helper ensures that the LocalParams arrays have actually been allocated before attempting to use them. glProgramLocalParameters4fvEXT needs to do a bit of extra checking, but it can be simplified since the helper has already validated the target. Fixes crashes in programs that use Cg (for example, Awesomenauts, Rocketbirds: Hardboiled Chicken, and Tiny and Big: Grandpa's Leftovers) since commit e5885c119de1e508099cc1111e1c9f8ff00fab88 (mesa: Dynamically allocate the storage for program local parameters.) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73136 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Brian Paul <[email protected]> Tested-by: Laurent Carlier <[email protected]>
* llvmpipe: Basic implementation of pipe_context::set_sample_mask.José Fonseca2014-01-075-7/+20
| | | | | | | | | | | | | | | | | We don't support MSAA (ie, number of samples is always one) therefore sample_mask boils down to a synonym of the rasterizer_discard flag. Also, this change makes setup actually use the value received in lp_setup_set_rasterizer_discard instead of reaching out to llvmpipe upper layers to re-fetch it. Based on Si Chen's draft. With this patch `wgf11multisample Coverage passes 100%` on the UMD D3D10 state tracker. Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Si Chen <[email protected]>
* cso_context: Fix cso_context::sample_mask initial value.José Fonseca2014-01-071-1/+1
| | | | | | | | | | | | The initial value of cso_context::sample_mask_saved is irrelevant as it will be overwritten with cso_context::sample_mask in cso_save_sample_mask. Therefore it is cso_context::sample_mask that needs to be properly initialized. This fixes regressions in blits and mipmap generation after adding support for sample_mask to llvmpipe. Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: Implement alpha_to_coverage for non-MSAA framebuffers.Si Chen2014-01-073-1/+59
| | | | | | | | Implement Alpha to Coverage by discarding a fragment alpha component is less than 0.5. This is a joint work of Jose and Si. Reviewed-by: José Fonseca <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* swrast: fix delayed texel buffer allocation regression for OpenMPAndreas Fänger2014-01-071-0/+12
| | | | | | | | | | | Commit 9119269ca14ed42b51c7d8e2e662500311b29fa3 moved the texel buffer allocation to _swrast_texture_span(), however, when compiled with OpenMP support this code already runs multi-threaded so a critical section is required to prevent multiple allocations and rendering errors. Cc: "10.0" <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* gallium/draw: remove double semicolonDave Airlie2014-01-071-1/+1
| | | | | | code cleanup. Signed-off-by: Dave Airlie <[email protected]>