summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* i965: Fix dump_prog_cache to handle compacted instructions.Kenneth Graunke2014-05-181-13/+5
| | | | | | | | | | | dump_prog_cache has interpreted compacted instructions as full size instructions, decoding garbage and complaining about invalid values. We can just use brw_dump_compile to handle this correctly in less code. The output format changes slightly, but it's still perfectly acceptable. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Use brw_dump_compile for clip, SF, and old GS programs.Kenneth Graunke2014-05-183-13/+3
| | | | | | | | | | | | Looping over the instructions and calling brw_disasm doesn't handle compacted instructions. In most cases, this hasn't been a problem since we don't compact prior to Sandybridge. However, Sandybridge's transform feedback GS program should already be compacted, and so this ought to fix decoding of that. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* nv50/ir: fix integer mul lowering for u32 x u32 -> high u32Ilia Mirkin2014-05-181-3/+4
| | | | | | | | | | | | UNION appears to expect that all of its sources are conditionally defined. Otherwise it inserts an unpredicated mov instruction which overwrites the desired result. This fixes tests that use UMUL_HI, and much less directly, unsigned integer division by a constant, which uses this functionality in a peephole pass. Signed-off-by: Ilia Mirkin <[email protected]> Cc: "10.1 10.2" <[email protected]> Reviewed-by: Ben Skeggs <[email protected]>
* nv50/ir: make sure that texprep/texquerylod's args get coalescedIlia Mirkin2014-05-181-0/+2
| | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Cc: "10.2" <[email protected]> Reviewed-by: Ben Skeggs <[email protected]>
* freedreno/a3xx: use util_format_compose_swizzles()Rob Clark2014-05-181-9/+9
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx/compiler: 1D texturesRob Clark2014-05-181-4/+25
| | | | | | | | Gallium already gives us height==1 for these, so the texture state is already setup correctly to emulate 1D textures as a Nx1 2D texture. We just need to supply the .y coord. Signed-off-by: Rob Clark <[email protected]>
* freedreno: fix capsRob Clark2014-05-181-2/+2
| | | | | | In particular, we want mesa to emulate primitive restart for us. Signed-off-by: Rob Clark <[email protected]>
* freedreno: fix index buffer offsetRob Clark2014-05-181-1/+1
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: add sRBG texture supportRob Clark2014-05-162-0/+15
| | | | | | | That was easy. Turns out it is just a matter of setting one bit. Enable sampling from sRGB texture, and therefore enable GL 2.1 :-) Signed-off-by: Rob Clark <[email protected]>
* freedreno: update generated headersRob Clark2014-05-164-20/+21
| | | | Signed-off-by: Rob Clark <[email protected]>
* gallivm: (trivial) fix compilation with llvm 3.1, 3.2Roland Scheidegger2014-05-171-0/+4
| | | | | | I actually checked the getModuleIdentifier() function exists with 3.1 but missed that the file moved... This fixes https://bugs.freedesktop.org/show_bug.cgi?id=78803
* gallivm: print out how long it takes to optimize shader IR.Roland Scheidegger2014-05-163-1/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enabled with GALLIVM_DEBUG=perf (which up to now was only used to print warnings for unoptimized code). While some unexpectedly long shader compile times for some shaders were fixed with 8a9f5ecdb116d0449d63f7b94efbfa8b205d826f this should help recognize such problems in the future. For now though only available in debug builds (which are not always suitable for such analysis). And since this uses system time, it might not be all that accurate (even llvmpipe's own rasterization threads might be running at the same time, or just other tasks). (llvmpipe also has LP_DEBUG=counters but this only gives an average per shader and the the total time for all shaders.) This prints information like this: optimizing module fs17_variant0 took 1 msec optimizing module setup_variant_0 took 0 msec optimizing module draw_llvm_vs_variant0 took 9 msec optimizing module draw_llvm_vs_variant0 took 12 msec optimizing module fs17_variant1 took 2 msec v2: rebase for recent gallivm compilation changes, and print time for whole modules instead of functions (otherwise it would be very spammy since it would include all trivial inline sse2 functions), using the shiny new module names, prying them off LLVM using new helper (not available through C bindings). Per function timings, while possibly giving more information (if there'd be a problem only in for instance the partial not the whole function), don't seem all that useful for now. Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: give more verbose names to modulesRoland Scheidegger2014-05-1610-26/+38
| | | | | | | | | When we had just one module "gallivm" was an appropriate name. But now we have modules containing all functions for a particular variant, so give it a corresponding name (this is really just for helping debugging). Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* mesa: fix double-freeing of dispatch tables inside glBegin/End.Brian Paul2014-05-161-2/+2
| | | | | | | | | | | We allocate dispatch tables for BeginEnd and OutsideBeginEnd. But when we destroy the context we were freeing the BeginEnd and Exec tables. If Exec==BeginEnd we did a double-free. This would happen if the context was destroyed while inside a glBegin/End pair. Now free the BeginEnd and OutsideBeginEnd pointers. Cc: "10.1", "10.2" <[email protected]> Reviewed-by: Michel Dänzer <[email protected]>
* i965: Use binary literals counter select.Matt Turner2014-05-151-2/+2
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* glsl_to_tgsi: Make sure the 'shader' member is always initializedMichel Dänzer2014-05-161-0/+3
| | | | | | | | | | | | | | | | | | | | | | | Fixes the valgrind report below and random crashes with piglit on radeonsi. ==30005== Conditional jump or move depends on uninitialised value(s) ==30005== at 0xB13584E: st_translate_program (st_glsl_to_tgsi.cpp:5100) ==30005== by 0xB14698B: st_translate_fragment_program (st_program.c:747) ==30005== by 0xB14777D: st_get_fp_variant (st_program.c:824) ==30005== by 0xB11219C: get_color_fp_variant (st_cb_drawpixels.c:1042) ==30005== by 0xB1131AE: st_DrawPixels (st_cb_drawpixels.c:1154) ==30005== by 0xAFF8806: _mesa_DrawPixels (drawpix.c:162) ==30005== by 0x4EB86DB: stub_glDrawPixels (generated_dispatch.c:6640) ==30005== by 0x4F1DF08: piglit_visualize_image (piglit-util-gl.c:1574) ==30005== by 0x40691D: draw_image_to_window_system_fb(int, bool) (draw-buffers-common.cpp:733) ==30005== by 0x406C8B: draw_reference_image(bool, bool) (draw-buffers-common.cpp:854) ==30005== by 0x40722A: piglit_display (alpha-to-coverage-dual-src-blend.cpp:117) ==30005== by 0x4EA7168: run_test (piglit_fbo_framework.c:52) Cc: "10.1 10.2" <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: remove optimization workaround when not having sse 4.1Roland Scheidegger2014-05-161-8/+1
| | | | | | | | | | | This workaround doesn't list any llvm version, but it was introduced 2010-06-10 (e277d5c1f6b2c5a6d202561e67d2b6821a69ecc4). It is unlikely this bug is still present in llvm versions we support (3.1+). There's no specific test listed, but I ran lp_test_arit (which uses the mentioned functions) on llvm 3.1 and 3.3 with sse41 disabled and this pass enabled without issues. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: remove workaround for reversing optimization pass order.Roland Scheidegger2014-05-161-13/+2
| | | | | | | | | | | | | | | | | 32bit code generation and llvm >= 2.7 used a different optimization pass order - this code was initially introduced (2010-07-23) by 815e79e72c1f4aa849c0ee6103621685b678bc9d, apparently due to buggy code being generated with then brand new llvm versions (which was llvm 2.7 plus pre 2.8 devel). It seems very highly likely that whatever this bug was it has been fixed in newer llvm versions, though there's no easy way to test this - the mentioned piglit test has been removed years ago, and even if you'd build it I'm sceptical the glsl compiler would still produce the required code to trigger it. I have no idea what a good order of passes is, but just remove the workaround and use the same order everywhere. Reviewed-by: Jose Fonseca <[email protected]>
* i965/gen8: Make disassembly function match brw's signature.Matt Turner2014-05-154-9/+12
| | | | | | | | gen8_dump_compile will be called indirectly by code common used by generations before and after the gen8 instruction format change. Acked-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Pass brw_context and assembly separately to brw_dump_compile.Matt Turner2014-05-156-14/+12
| | | | | | | | brw_dump_compile will be called indirectly by code common used by generations before and after the gen8 instruction format change. Acked-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Pull brw_compact_instructions() out of brw_get_program().Matt Turner2014-05-157-9/+10
| | | | | Acked-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/disasm: Align send instruction meta-information with dst.Matt Turner2014-05-151-0/+1
| | | | | | | Has been misaligned since we added instruction offset prefixes. Acked-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/disasm: Disassemble the compaction control bit.Matt Turner2014-05-159-10/+18
| | | | | | | | | brw_disasm doesn't disassemble compacted instructions, so we uncompact before disassembling them which would unset the compaction control bit. Instead pass it as a separate argument. Acked-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/cfg: Embed exec_node in bblock_link.Matt Turner2014-05-155-23/+25
| | | | | | | In order to remove bblock_link's inheritance of exec_node. Also makes linked list walk code much nicer. Acked-by: Eric Anholt <[email protected]>
* i965/cfg: Make brw_cfg.h closer to C-includable.Matt Turner2014-05-151-13/+23
| | | | | | Only bblock_link's inheritance left. Acked-by: Eric Anholt <[email protected]>
* i965/cfg: Protect brw_cfg.h from multiple inclusion.Matt Turner2014-05-151-0/+6
| | | | Acked-by: Eric Anholt <[email protected]>
* glsl: Add C-callable fprint_ir function.Matt Turner2014-05-152-0/+10
| | | | | Acked-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fb: Use meta path for stencil up/downsamplingTopi Pohjolainen2014-05-151-1/+8
| | | | | | Cc: "10.2" <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* i965/meta: Stencil blit for miptree updownsamplingTopi Pohjolainen2014-05-152-0/+38
| | | | | | Cc: "10.2" <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fb: Use meta path for stencil blitsTopi Pohjolainen2014-05-151-0/+9
| | | | | | | | | This is effective only on gen8 for now as previous generations still go through blorp. Cc: "10.2" <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/meta: Stencil blitsTopi Pohjolainen2014-05-153-0/+497
| | | | | | | | | | | | | v2: Create the intel renderbuffer with level hardcoded to zero instead of overriding it in the surface state configuration. Also moved the dimension adjustments for tiling, mip level, msaa into the render buffer creation. Finally prepares for another blit path needed for miptree updownsampling. v3 (Ken): Dropped unnecessary memory context for "ralloc_asprintf()" Cc: "10.2" <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* i965: Extend brw_get_rb_for_first_slice() for specified level/layerTopi Pohjolainen2014-05-152-7/+29
| | | | | | | | | | v2: Configure stencil directly for final dimensions instead of adjusting bit by bit for tiling, mip level and msaa. v3 (Ken): Used non-static constant for horizontal alignment Cc: "10.2" <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/gen8: Surface state overriding for stencilTopi Pohjolainen2014-05-151-13/+21
| | | | | | | | | | | | | | v2: Allow hardware to offset accesses to individual layers. Also leave the mip-level overriding for the creator of the intel renderbuffer to handle. Merged with "i965/gen8: Allow stencil buffers to be configured as single sampled" Ken: I left the "_mesa_problem()" still in place. I think it is clearer to remove it in a separate patch. Cc: "10.2" <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/wm: Surface state overrides for configuring w-tiled as y-tiledTopi Pohjolainen2014-05-152-0/+30
| | | | | | | | | | v2: Use intel_mipmap_tree::total_width in order to get correct alignment automatically. Also use "mt->total_height / mt->physical_depth0" as surface height allowing hardware to offset to correct slice. Cc: "10.2" <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965 meta up/downsample: Fix renderbuffer _BaseFormatJordan Justen2014-05-151-1/+2
| | | | | | | | | | | | mt->format is of type mesa_format, and therefore can't be used with _mesa_base_fbo_format which requires a GLenum input. On gen8, this fixes various piglit fbo-depthstencil tests with samples > 1. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "10.2" <[email protected]>
* i965: Delete current_insn() function.Matt Turner2014-05-152-7/+2
|
* i965: Remove blorp unit tests.Matt Turner2014-05-153-1099/+1
| | | | | | | | | They've served their purpose (in transitioning blorp to using fs_generator) and now they just necessitate large amounts of manual labor to regenerate if the disassembler changes. Reviewed-by: Topi Pohjolainen <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* egl-static: include libradeonwinsys.la only onceEmil Velikov2014-05-151-8/+5
| | | | | | | | | | | With this and the previous patch, we no longer have multiple definitions in the final egl_gallium.so. v2: Drop duplicate libloader link. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Chia-I Wu <[email protected]> (v1) Reviewed-by: Tom Stellard <[email protected]> (v1)
* gallium/radeon: link in libradeon.la at target levelEmil Velikov2014-05-1513-20/+24
| | | | | | | | | | | | It makes more sense to link the core and common parts of the driver as the target is build. Additionally this will help us drop duplicating symbols for targets that static link mulitple pipe-drivers. Only egl-static needs that currently with more to come. To simplify things a bit add HAVE_GALLIUM_RADEON_COMMON variable. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Tom Stellard <[email protected]>
* gallium/radeon: build only a single common library libradeonEmil Velikov2014-05-153-12/+5
| | | | | | | Just fold libllvmradeon in libradeon. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Tom Stellard <[email protected]>
* freedreno/a3xx: fix write to bogus registerRob Clark2014-05-141-2/+2
| | | | | | | | | | | The loops for updating the multiple packed fields in SP_VS_OUT[] and SP_VS_VPC_DST[] will zero out one register beyond the last that on required. Which is normally not a problem (and is kinda convenient when looking at cmdstream dumps) unless we have maximum (16) varyings. Fix loop termination condition so that this does not happen. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: account for special inputs/outputsRob Clark2014-05-141-2/+2
| | | | | | | | | We need to size input/output tables big enough for special inputs/ outputs (gl_Position, gl_FrontFacing, etc) which, while they don't count towards the hw limit of 16 attributes or 16 varyings, we do still need to track them all the same. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: fix MAX_INPUTS shader capRob Clark2014-05-143-1/+9
| | | | | | | | | | Hardware only supports 16. Which fd3_shader_variant properly reflected, but the pipe cap did not, leading to array overflow (and shaders that could not possibly work). Also a bunch of asserts to make problems like this easier to see. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: add debug flag to expose glsl130Rob Clark2014-05-142-3/+8
| | | | | | | | | | We are starting to add integer support to the compiler, which does not get exercised with glsl feature level 120 and without advertising integer support. But doing so breaks too many things right now. So for now use a debug flag to conditionally expose the functionality while it is in development. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx/compiler: add KILL_IFRyan Houdek2014-05-141-1/+35
| | | | | | | | | The KILL_IF opcode could potentially be merged in to the regular KILL opcode function. It was a pain to do so, so I've left is separated for cleanliness. Signed-off-by: Ryan Houdek <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx/compiler: start adding integer supportRyan Houdek2014-05-141-0/+169
| | | | | | | | | | | | | Adds a large sum of TGSI opcodes to the a3xx compiler. For integer opcodes we have 28 opcodes added. Adds 4 floating point compare opcodes If GLSL 1.30 is enabled, this allows the GLSL 1.30 piglits to have a completion amount of 432/641. Signed-off-by: Ryan Houdek <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* draw: better llvm names for shaders for debugging.Roland Scheidegger2014-05-151-6/+12
| | | | | | | | All shaders had the same name. We could probably use some identifier per shader too, but for now only use the variant number. Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: improve setup shader names (for debugging)Roland Scheidegger2014-05-151-38/+40
| | | | | | | | | | | | | The setup shaders were composed of both a fs shader number and a variant number. But since they aren't tied to a particular fragment shader, the former was a fixed zero while the latter was also always zero because it was never assigned. So, similar to what the fs code does, use a ever increasing number to give it a more catchy name (unlike fragment shaders though where this number is for each explicitly created shader, we just use it for the implicitly created variants). And while here, fix whitespace a bit. Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: kill off llvmpipe_variant_countRoland Scheidegger2014-05-154-20/+4
| | | | | | | Unused except it was increased for both fs and setup shader variants created. Probably some leftover from ages ago. Reviewed-by: Jose Fonseca <[email protected]>
* mesa/st: fix number of ubos being declared in a shaderRoland Scheidegger2014-05-151-3/+5
| | | | | | | | | | | | Previously the code used the total number of ubos being declared in the linked program (so the ubos of all shaders combined), use the number from the particular shader instead. This fixes an assertion failure with piglit arb_uniform_buffer_object-maxblocks seen in llvmpipe since 8a9f5ecdb116d0449d63f7b94efbfa8b205d826f as it now emits code for each declared buffer, not just the ones actually used. CC: "10.1 10.2" <[email protected]> Reviewed-by: Brian Paul <[email protected]>