summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* i965/fs: Handle printing of registers better.Jason Ekstrand2014-09-301-2/+6
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Explicitly set widths on gen5 math instruction destinations.Jason Ekstrand2014-09-301-1/+1
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Make half() divide the register width by 2 and use it moreJason Ekstrand2014-09-302-5/+13
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Add a concept of a width to fs_regJason Ekstrand2014-09-302-4/+78
| | | | | | | | | | | | | | | | | | | | Every register in i965 assembly implicitly has a concept of a "width". Usually, this is derived from the execution size of the instruction. However, when writing a compiler it turns out that it is frequently a useful to have the width explicitly in the register and derive the execution size of the instruction from the widths of the registers used in it. This commit adds a width field to fs_reg along with an effective_width() helper function. The effective_width() function tells you how wide the register effectively is when used in an instruction. For example, uniform values have width 1 since the data is not actually repeated, but when used in an instruction they take on the width of the instruction. However, for some instructions (LOAD_PAYLOAD being the notable exception), the width is not the same. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: A little harmless refactoring of register_coalesceJason Ekstrand2014-09-301-7/+7
| | | | | | | | | | Just pass the visitor into is_copy_payload() and is_coalesce_candidate() instead of a register size and the virtual_grf_sizes array. Among other things, this makes the code more obvious because you don't have to figure out where src_size came from. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/brw_reg: Add a firsthalf function and use it in the generatorJason Ekstrand2014-09-302-29/+44
| | | | | | | | Right now, this function is a no-op but it indicates that we intend to only use the first half of the 16-wide register. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Copy propagate partial reads.Jason Ekstrand2014-09-302-20/+64
| | | | | | | | | | | | | | | | | | This commit reworks copy propagation a bit to support propagating the copying of partial registers. This comes up every time we have pull constants because we do a pull constant read immediately followed by a move to splat the one component of the out to 8 or 16-wide. This allows us to eliminate the copy and simply use the one component of the register. Shader DB results: total instructions in shared programs: 5044937 -> 5044428 (-0.01%) instructions in affected programs: 66112 -> 65603 (-0.77%) GAINED: 0 LOST: 0 Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Refactor fs_inst::is_send_from_grf()Jason Ekstrand2014-09-301-9/+16
| | | | | | | | A switch statement is much easier to read/edit than a big giant or statement. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Clean up emit_fb_writesJason Ekstrand2014-09-302-112/+85
| | | | | | | | | This splits emit_fb_writes into two functions: emit_fb_writes and emit_single_fb_write. This reduces the amount of duplicated code in emit_fb_writes and makes the register number fiddling less arcane. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Print BAD_FILE registers in dump_instructionJason Ekstrand2014-09-301-1/+1
| | | | | | | | Sometimes these show up in LOAD_PAYLOAD instructions and it's nice to be able to see them. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Make compact_virtual_grfs an optimization passJason Ekstrand2014-09-302-8/+13
| | | | | | | | | | | | | | Previously we disabled compact_virtual_grfs when dumping optimizations. The idea here was to make it easier to diff the dumped shader because you didn't have a sudden renaming. However, sometimes a bug is affected by compact_virtual_grfs and, when this happens, you want to keep dumping instructions with compact_virtual_grfs enabled. By turning it into an optimization pass and dumping it along with the others, we retain the ability to diff because you can just diff against the compact_virtual_grf output. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i964/fs: Make immediate fs_reg constructors explicitJason Ekstrand2014-09-304-10/+11
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Make null_reg_* const members of fs_visitor instead of globalsJason Ekstrand2014-09-303-3/+12
| | | | | | | | | We also set the register width equal to the dispatch width. Right now, this is effectively a no-op since we don't do anything with it. However, it will be important once we add an actual width field to fs_reg. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Use the var_from_vgrf helper function instead of doing it manuallyJason Ekstrand2014-09-301-4/+4
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Fix a bug with dead_code_eliminate on large writesJason Ekstrand2014-09-301-1/+1
| | | | | | | | | | | | Previously, if an instruction wrote to more than one register, we implicitly assumed that it filled the entire register. We never hit this before because the only time we did multi-register writes was things like texturing which always wrote to all of the registers. However, with the upcoming ability to do 16-wide instructions in SIMD8 and things of that nature, we can have multi-register writes at offsets and we'll hit this. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Use the UW type for the destination of VARYING_PULL_CONSTANT_LOAD ↵Jason Ekstrand2014-09-301-2/+2
| | | | | | | | | | instructions Using a floating-point type doesn't usually cause hangs on my HSW, but the simulator complains about it quite a bit. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Use offset a lot more placesJason Ekstrand2014-09-304-93/+78
| | | | | | | | | | We have this wonderful offset() function for advancing registers, but we're not using it. Using offset() allows us to do some sanity checking and avoid manually touching fs_reg::reg_offset. In a few commits, we will make offset do even more nifty things for us. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: fix a comment in compact_virtual_grfsJason Ekstrand2014-09-301-1/+1
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Rewrite fs_visitor::split_virtual_grfsJason Ekstrand2014-09-301-47/+86
| | | | | | | | | | | | | | The original vgrf splitting code was written with the assumption that vgrfs came in two types: those that can be split into single registers and those that can't be split at all It was very conservative and bailed as soon as more than one element of a register was read or written. This won't work once we start allowing a regular MOV or ADD operation to operate on multiple registers. This rewrite allows for the case where a vgrf of size 5 may appropriately be split in to one register of size 1 and two registers of size 2. Signed-off-by: Jason Ekstrand <[email protected]> Acked-by: Matt Turner <[email protected]>
* i965/fs_live_variables: Use var_from_vgrf insead of repeating the calculationJason Ekstrand2014-09-301-2/+2
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Manually generate the meta fast-clear shaderJason Ekstrand2014-09-302-90/+35
| | | | | | | | | | | | | | | | | | Previously, we were generating the fast-clear shader from GLSL. The problem is that fast clears require that we use a replicated write rather than a regular write instruction. In order to get this we had a complicated and somewhat fragile optimization pass that looked for places where we can use a replicated write and used it. Since replicated writes have a lot of restrictions, we only ever use them for fast-clear operations. This commit replaces the optimization pass with a function that just generates the shader we want. This is a) less code, b) less fragile than the optimization pass, and c) generates a more efficient shader. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* radeonsi: Pass the slice size to si_dma_copy_bufferMichel Dänzer2014-09-301-4/+4
| | | | | | Otherwise some parts of tiled slices can be missed. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Catch more cases that can't be handled by si_dma_copy_buffer/tileMichel Dänzer2014-09-301-3/+11
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Fix si_dma_copy(_tile) for compressed formatsMichel Dänzer2014-09-301-2/+2
| | | | | | | Fixes GPUVM faults when running the piglit test "getteximage-formats init-by-rendering" with R600_DEBUG=forcedma on SI. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Fix tiling mode index for stencil resourcesMichel Dänzer2014-09-301-2/+3
| | | | | | | | | | | We are currently only dealing with depth-only or stencil-only resources here, not with resources having both depth and stencil[0]. In both cases, the tiling mode index is in the tile_mode field, not in the stencil_tile_mode field. [0] Add an assertion for that. Reviewed-by: Marek Olšák <[email protected]>
* ilo: fix format of edge flag pointerChia-I Wu2014-09-301-3/+5
| | | | | | | The VE format of edge flag pointers was changed in 780ce576bb1781f027797039693b98253ee4813e. Signed-off-by: Chia-I Wu <[email protected]>
* ilo: add a pass to finalize ilo_ve_stateChia-I Wu2014-09-308-158/+190
| | | | | | | | Add finalize_vertex_elements() to finalize ilo_ve_state. This fixes a potential issue with URB entry allocation for VS and move the complexity of gen6_3DSTATE_VERTEX_ELEMENTS() to the new function. Signed-off-by: Chia-I Wu <[email protected]>
* ilo: precalculate aligned depth buffer sizeChia-I Wu2014-09-307-48/+43
| | | | | | To replace the hacky zs_align_surface(). Signed-off-by: Chia-I Wu <[email protected]>
* ilo: use dynamic bo for rectlist verticesChia-I Wu2014-09-3010-73/+92
| | | | | | The size is always 24 bytes. We can upload them to the dynamic buffer. Signed-off-by: Chia-I Wu <[email protected]>
* st/xa: Fix regression in xa_yuv_planar_blit()Thomas Hellstrom2014-09-302-0/+12
| | | | | | | | | | Commit "st/xa: scissor to help tilers" broke xa_yuv_planar_blit() and vmwgfx textured video. Fix this by implementing scissors also in the yuv draw path. Signed-off-by: Thomas Hellstrom <[email protected]> Reviewed-by: Sinclair Yeh <[email protected]> Cc: Rob Clark <[email protected]> Cc: "10.2 10.3" <[email protected]>
* i965: Delete intel_chipset.h.Kenneth Graunke2014-09-295-251/+0
| | | | | | Unused; it was replaced by include/pci_ids/i965_pci_ids.h long ago. Acked-by: Matt Turner <[email protected]>
* driconf: Correct and update Catalan translationAlex Henrie2014-09-291-8/+8
| | | | | Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* driconf: Update Spanish translationAlex Henrie2014-09-291-5/+5
| | | | | Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* driconf: Synchronize po filesAlex Henrie2014-09-296-319/+390
| | | | | Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* vc4: Don't try to do stores to buffers that aren't bound.Eric Anholt2014-09-291-5/+8
| | | | | | | | | | | | | | The code was kind of mixed up what buffers were getting stored in the case that a resolve bit was unset (which are set based on the GL state at draw time) and the buffer wasn't actually bound. In particular, depth-only rendering would store the color buffer contents, which happen to be pointing at the depth buffer. Thanks to clearing out the resolve bits for things we really can't resolve, now I can drop the safety checks for buffer presence around the actual stores. Fixes 42 piglit tests.
* vc4: Shove some depth comparison bits down to where they're used.Eric Anholt2014-09-291-5/+5
|
* i965: Use BRW_MATH_DATA_SCALAR when source regioning is scalar.Matt Turner2014-09-296-11/+9
| | | | | | Notice the mistaken (but harmless) argument swapping in brw_math_invert(). Reviewed-by: Kenneth Graunke <[email protected]>
* i965/compaction: Move variable declarations to their uses.Matt Turner2014-09-291-5/+4
| | | | Tested-by: Mark Janes <[email protected]>
* i965/compaction: Simplify jump target code.Matt Turner2014-09-291-26/+18
| | | | | | | | | | | | | | My attempts to clarify the code with _compacted/_uncompacted prefixed variables apparently failed. Hopefully this is clearer. In any case, the previous code wasn't clear enough to gcc to let it optimize division by a power of two into a shift. No problems now. Also, the previous code (in the ADD case) didn't work on 32-bit x86, due to complicated set of interactions best summed up as unsigned division and compiler optimizations. Tested-by: Mark Janes <[email protected]>
* freedreno/a3xx: re-emit shaders on variant changeRob Clark2014-09-292-1/+50
| | | | | | | | | We need to keep track if a state change other than frag/vert shader state will trigger us to need a different shader variant, and if necessary mark the appropriate shader state as dirty. Otherwise we will forget to re-emit the shader state. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: add some cmdline argsRob Clark2014-09-291-8/+87
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: add support to emulate GL_CLAMPRob Clark2014-09-298-16/+129
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: add texcoord clamp support to loweringRob Clark2014-09-292-5/+173
| | | | | | | This is for hw that needs to emulate some texture wrap modes (like CLAMP) with some help from the shader. Signed-off-by: Rob Clark <[email protected]>
* freedreno: move bind_sampler_states to per-generationRob Clark2014-09-294-23/+48
| | | | | | | | | Keep the existing function as a common helper. But this lets us move an a2xx specific hack out of common code. And the PIPE_TEX_WRAP_CLAMP emulation will require an a3xx specific hack. So rather than piling on hacks, split this out. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: fix border color orderRob Clark2014-09-291-5/+4
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: add 32bit integer vtx formatsRob Clark2014-09-292-17/+37
| | | | Signed-off-by: Rob Clark <[email protected]>
* vc4: Add support for GL 1.1's stupid CLAMP mode.Eric Anholt2014-09-291-4/+19
| | | | | | We just clamp the incoming texture coordinates. This breaks the lambda calculation, but it gets the piglit tests to pass. This is the same behavior as in i965.
* vc4: Add support for texture border color.Eric Anholt2014-09-292-1/+84
| | | | | | | One spot in the docs says that it's stored at a miplevel just beyond the last miplevel, which was scary. But really, you just load it as the R coordinate (which conflicts with cubemaps, but you don't do border clamping on cubes).
* vc4: Add the necessary stubs for occlusion queries.Eric Anholt2014-09-294-1/+87
| | | | | | We have to expose them for GL 2.0, but we just always return a value of 0. We should be advertising 0 query bits instead of 64, but gallium doesn't have plumbing for that yet. At least this stops the segfaults.
* vc4: Optimize out silly SUBs of 0.Eric Anholt2014-09-291-0/+11
| | | | | Drops instructions on vs-temp-array-mat4-index-col-row-wr.shader_test, which I was looking at because it's failing to register allocate.