summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* r600g: fix layered clearMarek Olšák2014-09-011-1/+2
| | | | | Cc: [email protected] Acked-by: Michel Dänzer <[email protected]>
* r600g: some DB bug workarounds for R6xx DB flushingMarek Olšák2014-09-011-0/+7
| | | | Acked-by: Michel Dänzer <[email protected]>
* r600g: enable fast depth clear for array textures and cubemapsMarek Olšák2014-09-011-1/+2
| | | | | | I have a piglit test that hits this. Acked-by: Michel Dänzer <[email protected]>
* r600g: use HTILE allocator from SIMarek Olšák2014-09-013-47/+23
| | | | | | | | | | | | It's almost the same. This enables tiling for HTILE. It also enables Hyper-Z for other texture targets (1D, 1D_ARRAY, 2D_ARRAY, CUBE, CUBE_ARRAY, 3D, RECT). 2D array depth textures are tested by Unigine Sanctuary and my new piglit test. Acked-by: Michel Dänzer <[email protected]>
* r600g: set DB_DEPTH_SIZE.HEIGHT_TILE_MAX for EG/CM, inline other fieldsMarek Olšák2014-09-011-9/+12
| | | | | | | | This fixes rendering to non-zero layer/face/slice with HTILE. v2: added the assertion Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: set DB_DEPTH_SIZE.HEIGHT_TILE_MAX, inline other fieldsMarek Olšák2014-09-011-9/+8
| | | | | | | | | | This fixes rendering to a non-zero layer/face/slice with HTILE. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72685 v2: added the assertion Reviewed-by: Michel Dänzer <[email protected]>
* r600g: Implement sm5 geometry shader instancingGlenn Kennard2014-09-013-2/+14
| | | | | | Requires Evergreen or later hardware. Signed-off-by: Glenn Kennard <[email protected]>
* glsl_to_tgsi: allocate and enlarge arrays for temporaries on demandMarek Olšák2014-09-011-18/+33
| | | | | | | | | | | This fixes crashes if the number of temporaries is greater than 4096. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66184 v2: added fail paths for realloc failures Cc: 10.2 10.3 [email protected] Reviewed-by: Ilia Mirkin <[email protected]>
* gallium/pb_bufmgr_cache: limit the size of cacheMarek Olšák2014-09-014-8/+30
| | | | | | This should make a machine which is running piglit more responsive at times. e.g. streaming-texture-leak can easily eat 600 MB because of how fast it creates new textures.
* pipe-loader: use the correct screen indexMarek Olšák2014-09-011-2/+18
|
* egl/dri2: use the correct screen indexMarek Olšák2014-09-012-10/+30
| | | | Required for multi-GPU configuration where each GPU has its own X screen.
* i965/fs: don't use ir->shadow_comparitor in emit_texture_*Connor Abbott2014-09-012-7/+5
| | | | | Signed-off-by: Connor Abbott <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: don't pass ir_variable * to emit_samplepos_setup()Connor Abbott2014-09-013-5/+4
| | | | | | | | We were only using it to get at its type, which we already know because it's a builtin variable. Signed-off-by: Connor Abbott <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: don't pass ir_variable * to emit_frontfacing_interpolation()Connor Abbott2014-09-014-6/+6
| | | | | | | | | | We were only using it to get at its type, which we already know because it's a builtin variable. v2 (Ken): Rebase on Matt's optimized gl_FrontFacing calculations. Signed-off-by: Connor Abbott <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Fix GPU hangs when INTEL_DEBUG=no16 is set.Kenneth Graunke2014-08-311-1/+2
| | | | | | | | The replicated data clear shader needs to be SIMD16, or else the GPU will hang. So, compile it even if INTEL_DEBUG=no16 is set. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* mesa: fix make tarballsEmil Velikov2014-09-011-1/+2
| | | | | | | | | | | | | Current method of generating distribution tar-balls involves manually invoking make + target name in the appropriate places. This temporary solution is used until we get 'make dist' working. Currently it does not work, as in order to have the target (which is also a filename) available in the final Makefile we need to add a PHONY target + use the correct target name. Cc: "10.2 10.3" <[email protected]> Signed-off-by: Emil Velikov <[email protected]>
* i965/vec4: Remove try_emit_saturateAbdiel Janulgue2014-08-312-22/+0
| | | | | | | | | Now that saturate is implemented natively as an instruction, we can cut down on unneeded functionality. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Signed-off-by: Abdiel Janulgue <[email protected]>
* i965/fs: Refactor try_emit_saturateAbdiel Janulgue2014-08-311-15/+8
| | | | | | | | | | | | v3: Since the fs backend can emit saturate as a separate instruction, there is no need to detect for min/max instructions and to rewrite the instruction tree accordingly. On the other hand, we don't need to emit a separate saturated mov either when the expression generating src can do saturate directly. v4: Add can_do_saturate() check before enabling saturate modifer (Ken) Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Signed-off-by: Abdiel Janulgue <[email protected]>
* ir_to_mesa, glsl_to_tgsi: Remove try_emit_saturateAbdiel Janulgue2014-08-312-99/+0
| | | | | | | | | Now that saturate is implemented natively as instruction, we can cut down on unneeded functionality. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Signed-off-by: Abdiel Janulgue <[email protected]>
* i965/vec4: Allow propagation of instructions with saturate flag to selAbdiel Janulgue2014-08-311-27/+58
| | | | | | | | | | | | | | | | | When sel conditon is bounded within 0 and 1.0. This allows code as: mov.sat a b sel.ge dst a 0.25F To be propagated as: sel.ge.sat dst b 0.25F v3: - Syntax clarifications in inst->saturate assignment - Remove extra parenthesis when assigning src_reg value from copy_entry (Matt Turner) v4: - Take channels into consideration when propagating saturated instructions. Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Abdiel Janulgue <[email protected]>
* i965/fs: Allow propagation of instructions with saturate flag to selAbdiel Janulgue2014-08-311-1/+17
| | | | | | | | | | | | | | When sel conditon is bounded within 0 and 1.0. This allows code as: mov.sat a b sel.ge dst a 0.25F To be propagated as: sel.ge.sat dst b 0.25F v3: Syntax clarifications in inst->saturate assignment (Matt Turner) Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Abdiel Janulgue <[email protected]>
* glsl: Optimize clamp(x, b, 1.0), where b > 0.0 as max(saturate(x),b)Abdiel Janulgue2014-08-311-0/+23
| | | | | | | | | | | | v2: - Output max(saturate(x),b) instead of saturate(max(x,b)) - Make sure we do component-wise comparison for vectors (Ian Romanick) v3: - Add missing condition where the outer constant value is > 0.0 and inner constant is 1.0. - Fix comments to show that the optimization is a commutative operation (Matt Turner) Reviewed-by: Ian Romanick <[email protected]> Signed-off-by: Abdiel Janulgue <[email protected]>
* glsl: Optimize clamp(x, 0.0, b), where b < 1.0 as min(saturate(x),b)Abdiel Janulgue2014-08-311-0/+39
| | | | | | | | | | | v2: - Output min(saturate(x),b) instead of saturate(min(x,b)) suggested by Ilia Mirkin - Make sure we do component-wise comparison for vectors (Ian Romanick) v3: - Add missing condition where the outer constant value is zero and inner constant is < 1 - Fix comments to reflect we are doing a commutative operation (Matt Turner) Reviewed-by: Ian Romanick <[email protected]> Signed-off-by: Abdiel Janulgue <[email protected]>
* glsl: Optimize clamp(x, 0, 1) as saturate(x)Abdiel Janulgue2014-08-311-0/+36
| | | | | | | | | | | v2: - Check that the base type is float (Ian Romanick) v3: - Make sure comments reflect that we are doing a commutative operation - Add missing condition where the inner constant is 1.0 and outer constant is 0.0 - Make indexing of operands easier to read (Matt Turner) Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Signed-off-by: Abdiel Janulgue <[email protected]>
* glsl: Implement saturate as ir_unop_saturateAbdiel Janulgue2014-08-311-5/+1
| | | | | | | | | Now that we have the ir_unop_saturate implemented as a single instruction, generate the correct simplified expression. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Signed-off-by: Abdiel Janulgue <[email protected]>
* yi965/vec4: Add support for ir_unop_saturateAbdiel Janulgue2014-08-311-0/+4
| | | | | Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Abdiel Janulgue <[email protected]>
* i965/fs: Add support for ir_unop_saturateAbdiel Janulgue2014-08-312-0/+5
| | | | | Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Abdiel Janulgue <[email protected]>
* ir_to_mesa, glsl_to_tgsi: Add support for ir_unop_saturateAbdiel Janulgue2014-08-312-0/+12
| | | | | | Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Signed-off-by: Abdiel Janulgue <[email protected]>
* ir_to_mesa, glsl_to_tgsi: lower ir_unop_saturateAbdiel Janulgue2014-08-312-2/+9
| | | | | | | | Needed when vertex programs doesn't allow saturate Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Signed-off-by: Abdiel Janulgue <[email protected]>
* glsl: Add a pass to lower ir_unop_saturate to clamp(x, 0, 1)Abdiel Janulgue2014-08-312-0/+30
| | | | | | Signed-off-by: Abdiel Janulgue <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* glsl: Add constant evaluation of ir_unop_saturateAbdiel Janulgue2014-08-311-0/+6
| | | | | | | | v2: Use CLAMP macro (Ian Romanick) Signed-off-by: Abdiel Janulgue <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* glsl: Add ir_unop_saturateAbdiel Janulgue2014-08-313-0/+4
| | | | | | Signed-off-by: Abdiel Janulgue <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4/fs: Count loops in shader debugAbdiel Janulgue2014-08-312-4/+8
| | | | | Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Abdiel Janulgue <[email protected]>
* i965/vec4: inline generate_vec4_instruction() within generate_code()Abdiel Janulgue2014-08-312-316/+296
| | | | | | | | | | Suggested by Matt. This patch combines and moves back the code-generation functions from generate_vec4_instruction() into generate_code(). Makes generate_code() a bit larger, but helps us to count loops in a straightforward manner. Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Abdiel Janulgue <[email protected]>
* i965: Add 2x MSAA support to Broadwell fast clear code.Kenneth Graunke2014-08-311-0/+1
| | | | | | | | | | | | According to the cited documentation section (but in the newer docs), x_scaledown is the same for 2x and 4x MSAA. +47 piglits. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83081 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: "10.3" <[email protected]>
* i965/vec4: Update register coalescing test.Matt Turner2014-08-301-4/+1
| | | | | | | In commit 04895f5c I added support for reswizzling writemasks. This test was checking that we didn't support this. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82881
* i965: Use unreachable() to silence warning.Matt Turner2014-08-301-2/+1
| | | | | | | | brw_meta_fast_clear.c:211:17: warning: 'x_scaledown' may be used uninitialized in this function [-Wmaybe-uninitialized] unsigned int x_scaledown, y_scaledown; Reviewed-by: Kenneth Graunke <[email protected]>
* ilo: set INTEL_RELOC_GGTT only on GEN6Chia-I Wu2014-08-311-7/+17
| | | | We asked MI commands to use GGTT only on GEN6.
* ilo: fix bound check for 3DSTATE_URB_VSChia-I Wu2014-08-311-3/+3
| | | | Fix max/min entries on GEN7.5 GT2/GT3.
* ilo: replace cmd by dw0 in GPEChia-I Wu2014-08-312-167/+236
| | | | | With e3c251071b0c9396c3ec76d1cf943c60ae297281, the magic values are gone. We no longer need "cmd" to hide them. Replace it by dw0.
* st/hgl: Move st_visual create/destroy into hgl state_trackerAlexander von Gluck IV2014-08-304-100/+111
|
* st/hgl: Move st_manager create/destroy into hgl state_trackerAlexander von Gluck IV2014-08-303-29/+58
|
* freedreno/ir3: fix potential null ptr derefRob Clark2014-08-301-1/+2
| | | | | | Fix potential segfault in debug code. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: add TXBRob Clark2014-08-301-0/+5
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: detect scheduler failRob Clark2014-08-303-4/+21
| | | | | | | | | | | | | | | | | | | | There are some cases where the scheduler can get itself into impossible situations, by scheduling the wrong write to pred or addr register first. (Ie. it could end up being unable to schedule any instruction if some instruction which depends on the current addr/reg value also depends on another addr/reg value.) To solve this we'd need to be able to insert extra mov instructions (which would also help when register assignment gets into impossible situations). To do that, we'd need to move the nop padding from sched into legalize. But to start with, just detect when we get into an impossible situation and bail, rather than sitting forever in an infinite loop. This way it will at least fall back to the old compiler, which might even work if you are lucky. Signed-off-by: Rob Clark <[email protected]>
* glsl: Use bit-flags image attributes and uint16_t for the image formatIan Romanick2014-08-296-43/+42
| | | | | | | | | | | | | | | | | | | | | | All of the GL image enums fit in 16-bits. Also move the fields from the anonymous "image" structucture to the next higher structure. This will enable packing the bits with the other bitfield. Valgrind massif results for a trimmed apitrace of dota2: n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B) Before (32-bit): 76 40,572,916,873 68,831,248 63,328,783 5,502,465 0 After (32-bit): 70 40,577,421,777 68,487,584 62,973,695 5,513,889 0 Before (64-bit): 60 36,822,640,058 96,526,824 88,735,296 7,791,528 0 After (64-bit): 74 37,124,603,758 95,891,808 88,466,712 7,425,096 0 A real savings of 346KiB on 32-bit and 262KiB on 64-bit. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: Use a single bit for the dual-source blend indexIan Romanick2014-08-291-5/+9
| | | | | | | | | | | | | | | | | | | The only values allowed are 0 and 1, and the value is checked before assigning. Valgrind massif results for a trimmed apitrace of dota2: n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B) Before (32-bit): 74 40,580,119,657 69,186,544 63,506,327 5,680,217 0 After (32-bit): 76 40,572,916,873 68,831,248 63,328,783 5,502,465 0 Before (64-bit): 89 36,822,971,897 96,526,616 88,735,296 7,791,320 0 After (64-bit): 60 36,822,640,058 96,526,824 88,735,296 7,791,528 0 A real savings of 173KiB on 32-bit and no change on 64-bit. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: Eliminate ir_variable::data.atomic.buffer_indexIan Romanick2014-08-295-6/+7
| | | | | | | | | | | | | | | | | | | Just use ir_variable::data.binding... because that's the where the binding is stored for everything else that can use layout(binding=). Valgrind massif results for a trimmed apitrace of dota2: n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B) Before (32-bit): 50 40,564,927,443 69,185,408 63,683,871 5,501,537 0 After (32-bit): 74 40,580,119,657 69,186,544 63,506,327 5,680,217 0 Before (64-bit): 59 36,822,048,449 96,526,888 89,113,000 7,413,888 0 After (64-bit): 89 36,822,971,897 96,526,616 88,735,296 7,791,320 0 A real savings of 173KiB on 32-bit and 368KiB on 64-bit. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Delete ctx->GeometryProgram.Cache.Kenneth Graunke2014-08-292-5/+0
| | | | | | | | | | The VertexProgram and FragmentProgram have a Cache member for dealing with fixed function programs. There are no fixed function geometry programs, so this should never have existed, and was just copy and pasted. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* gallivm: fix somewhat broken NaN behavior for exp2Roland Scheidegger2014-08-302-13/+25
| | | | | | | | | | | I actually screwed that up in 754319490f6946a9ad5ee619822d5fe4254e6759, mistakenly thinking the code actually wanted the non-nan result before. So, introduce that missing nan behavior case and use that instead. For sse, there's no actual change in the resulting code at all, the fallback code wouldn't have done the right thing though. Of course, the actual issue I saw with pow() was completely unrelated... Reviewed-by: Jose Fonseca <[email protected]>