summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* llvmpipe: Make a llvmpipe OpenGL context thread safe.Mathias Fröhlich2014-09-304-18/+40
| | | | | | | | | | | | | | | | | This fixes the remaining problem with the recently introduced global jit memory manager. This change again uses a memory manager that is local to gallivm_state. This implementation still frees the majority of the memory immediately after compilation. Only the generated code is deferred until this code is no longer used. This change and the previous one using private LLVMContext instances I can now safely run several independent OpenGL contexts driven by llvmpipe from different threads. v3: Rebase on llvm-3.6 compile fixes. Reviewed-by: Jose Fonseca <[email protected]> Signed-off-by: Mathias Froehlich <[email protected]>
* llvmpipe: Use two LLVMContexts per OpenGL context instead of a global one.Mathias Fröhlich2014-09-3013-31/+40
| | | | | | | | | | | | This is one step to make llvmpipe thread safe as mandated by the OpenGL standard. Using the global LLVMContext is obviously a problem for that kind of use pattern. The patch introduces two LLVMContext instances that are private to an OpenGL context and used for all compiles. One is put into struct draw_llvm and the other one into struct llvmpipe_context. Reviewed-by: Jose Fonseca <[email protected]> Signed-off-by: Mathias Froehlich <[email protected]>
* i965/brw_reg: Make the accumulator register take an explicit width.Jason Ekstrand2014-09-303-10/+15
| | | | | | | The big pile of patches I just pushed regresses about 25 piglit tests on SNB. This fixes the regressions. Signed-off-by: Jason Ekstrand <[email protected]>
* llvmpipe: move lp_jit_screen_init() call after allocation of screen objectBrian Paul2014-09-301-3/+5
| | | | | | | | | The screen argument isn't actually used by lp_jit_screen_init() at this time, but let's move the call so that we pass a valid pointer. v2: don't leak screen if lp_jit_screen_init() fails. Reviewed-by: Roland Scheidegger <[email protected]>
* tgsi: fix Semantic.Name assignment in tgsi_transform_input_decl()Brian Paul2014-09-301-1/+1
| | | | | | | Assign the sem_name parameter, not TGSI_SEMANTIC_GENERIC. Fixes polygon stipple regression. Reviewed-by: Charmaine Lee <[email protected]>
* util: simplify PIPE_TEXTURE_CUBE case in util_max_layer()Brian Paul2014-09-301-2/+3
| | | | | | | | | For cube resources, the array_size value should be 6. So handle that case as we do for array texture resources. But assert that array_size==6 just to be safe. Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* softpipe: don't special case PIPE_TEXTURE_CUBE in softpipe_resource_layout()Brian Paul2014-09-301-2/+3
| | | | | | As with the previous patch for llvmpipe. Reviewed-by: Ilia Mirkin <[email protected]>
* llvmpipe: remove special case for PIPE_TEXTURE_CUBE in llvmpipe_texture_layout()Brian Paul2014-09-301-3/+6
| | | | | | | layers (aka array_size) should be 6 for cube textures so we don't need to special-case it. But add an assertion just to be safe. Reviewed-by: Ilia Mirkin <[email protected]>
* gallium: add doc note about cube textures and can_create_resource()Brian Paul2014-09-301-0/+2
| | | | | | Just to be clear, and echo the description for resource_create(). Reviewed-by: Ilia Mirkin <[email protected]>
* st/mesa: remove unneded PIPE_TEXTURE_CUBE check in st_texture_create()Brian Paul2014-09-301-1/+1
| | | | | | | Earlier in the function we assert layers==6 for PIPE_TEXTURE_CUBE so there's no reason to special-case the pt.array_size = layers assignment. Reviewed-by: Ilia Mirkin <[email protected]>
* mesa: Drop the always-software-primitive-restart paths.Eric Anholt2014-09-305-58/+8
| | | | | | | The core sw primitive restart code is still around, because i965 uses it in some cases, but there are no drivers that want it on all the time. Reviewed-by: Rob Clark <[email protected]>
* gallium: Drop software-only primitive restart support.Eric Anholt2014-09-301-3/+2
| | | | | | | | | | | | | The drivers not flagging primitive restart support are r300 swtcl, svga, nv30, and vc4. The point of primitive restart is to slightly reduce draw call overhead for apps by batching multiple draws. If we do an extra pass to read the index buffer and split back into multiple draws, we've entirely missed the point. This is particularly bad for drivers that otherwise have hardware IB reads, where the readback is probably uncached. Reviewed-by: Rob Clark <[email protected]>
* i965/fs: Properly calculate the number of instructions in ↵Jason Ekstrand2014-09-301-1/+3
| | | | | | | calculate_register_pressure Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Use the GRF for FB writes on gen >= 7Jason Ekstrand2014-09-306-71/+142
| | | | | | | | | | | | | | | On gen 7, the MRF was removed and we gained the ability to do send instructions directly from the GRF. This commit enables that functinoality for FB writes. v2: Make handling of components more sane. i965/fs: Force a high register for the final FB write v2: Renamed the array for the range mappings and added a comment Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Handle COMPR4 in LOAD_PAYLOADJason Ekstrand2014-09-302-1/+36
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Constant propagate into LOAD_PAYLOADJason Ekstrand2014-09-301-0/+1
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Add split_virtual_grfs and compute_to_mrf after lower_load_payloadJason Ekstrand2014-09-301-0/+2
| | | | | | | | If we are going to use LOAD_PAYLOAD operations to fill MRF registers, then we will need this. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Add a an optional source to the FS_OPCODE_FB_WRITE instructionJason Ekstrand2014-09-304-29/+28
| | | | | | | | | Previously, we were use the base_mrf parameter of fs_inst to store the MRF location. In preparation for doing FB writes from the GRF, we now also allow you to set inst->base_mrf to -1 and provide a source register. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Use the GRF for UNTYPED_SURFACE_READ instructionsJason Ekstrand2014-09-304-16/+24
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Use the GRF for UNTYPED_ATOMIC instructionsJason Ekstrand2014-09-306-25/+36
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Add a function for getting a component of a 8 or 16-wide registerJason Ekstrand2014-09-301-0/+10
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Use the instruction execution size directly for texture generationJason Ekstrand2014-09-301-3/+10
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Use exec_size instead of force_uncompressed in dump_instructionJason Ekstrand2014-09-301-6/+7
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Use instruction execution sizes instead of heuristicsJason Ekstrand2014-09-303-23/+10
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Use instruction execution sizes to set compression stateJason Ekstrand2014-09-301-6/+19
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Remove unneeded uses of force_uncompressedJason Ekstrand2014-09-303-25/+9
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Derive force_uncompressed from instruction exec_sizeJason Ekstrand2014-09-301-0/+3
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Make fs_reg::effective_width take fs_inst* instead of fs_visitor*Jason Ekstrand2014-09-303-37/+43
| | | | | | | | | | | | | Now that we have execution sizes, we can use that instead of the dispatch width. This way it also works for 8-wide instructions in SIMD16. i965/fs: Make effective_width a variable instead of a function i965/fs: Preserve effective width in constant propagation Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Better guess the width of LOAD_PAYLOADJason Ekstrand2014-09-301-2/+9
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Add an exec_size field to fs_instJason Ekstrand2014-09-305-32/+126
| | | | | | | | | | | | | | | This will, eventually, allow us to manage execution sizes of instructions in a much more natural way from the fs_visitor level. i965/fs: Explicitly set instruction execute size a couple of places i965/blorp: Explicitly set instruction execute sizes Since blorp is all 16-wide and nothing isn't, in general, very careful about register width, we'll just set it all explicitly. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Determine partial writes based on the destination widthJason Ekstrand2014-09-302-5/+3
| | | | | | | | | Now that we track both halves of a 16-wide vgrf, we no longer need to worry about force_sechalf or force_uncompressed. The only real issue is if the destination is too small. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Fix a bug in register coalesceJason Ekstrand2014-09-301-0/+17
| | | | | | | | | | This commit fixes a bug in register coalesce that happens when one register is moved to another the proper number of times but the channels are re-arranged. When this happens, the previous code would happily coalesce the registers regardless of the fact that the channel mappins were wrong. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Rework GEN5 texturing code to use fs_reg and offset()Jason Ekstrand2014-09-301-39/+38
| | | | | | | | | | | | Now that offset() can properly handle MRF registers, we can use an MRF fs_reg and let offset() handle incrementing it correctly for different dispatch widths. While this doesn't have any noticeable effect currently, it does ensure that the destination register is 16-wide which will be necessary later when we start detecting execution sizes based on source and destination registers. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs_reg: Allocate double the number of vgrfs in SIMD16 modeJason Ekstrand2014-09-309-157/+371
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is actually the squash of a bunch of different changes. Individual commit titles follow: i965/fs: Always 2-align registers SIMD16 for gen <= 5 i965/fs: Use the register width when applying offsets This reworks both byte_offset() and offset() to be more intelligent. The byte_offset() function now supports offsets bigger than 32. The offset() function uses the byte_offset() function together with the register width and the type size to offset the register by the correct amount. i965/fs: Change regs_read to be in hardware registers i965/fs: Change regs_written to be actual hardware registers i965/fs: Properly handle register widths in LOAD_PAYLOAD The LOAD_PAYLOAD instruction is a bit special because it collects a bunch of registers (with possibly different widths) into a single payload block. Once the payload is constructed, it's treated as a single block of data and most of the information such as register widths doesn't matter anymore. In particular, the offset of any particular source register is the accumulation of the sizes of the previous source registers. i965/fs: Properly set writemasks in LOAD_PAYLOAD i965/fs: Handle register widths in demote_pull_constants i965/fs: Get rid of implicit register doubling in the allocator i965/fs: Reserve enough registers for PLN instructions i965/fs: Make sources and destinations interfere in 16-wide i965/fs: Properly handle register widths in CSE i965/fs: Properly handle register widths in register_coalesce i965/fs: Properly handle widths in copy propagation i965/fs: Properly handle register widths in VARYING_PULL_CONSTANT_LOAD i965/fs: Properly handle register widths and odd register sizes in spilling i965/fs: Don't waste a register on texture lookups for gen >= 7 Previously, we were waisting a register in SIMD16 mode because we could only allocate registers in pairs. Now that we can allocate and address odd-sized registers, let's get rid of this special-case. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Handle printing of registers better.Jason Ekstrand2014-09-301-2/+6
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Explicitly set widths on gen5 math instruction destinations.Jason Ekstrand2014-09-301-1/+1
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Make half() divide the register width by 2 and use it moreJason Ekstrand2014-09-302-5/+13
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Add a concept of a width to fs_regJason Ekstrand2014-09-302-4/+78
| | | | | | | | | | | | | | | | | | | | Every register in i965 assembly implicitly has a concept of a "width". Usually, this is derived from the execution size of the instruction. However, when writing a compiler it turns out that it is frequently a useful to have the width explicitly in the register and derive the execution size of the instruction from the widths of the registers used in it. This commit adds a width field to fs_reg along with an effective_width() helper function. The effective_width() function tells you how wide the register effectively is when used in an instruction. For example, uniform values have width 1 since the data is not actually repeated, but when used in an instruction they take on the width of the instruction. However, for some instructions (LOAD_PAYLOAD being the notable exception), the width is not the same. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: A little harmless refactoring of register_coalesceJason Ekstrand2014-09-301-7/+7
| | | | | | | | | | Just pass the visitor into is_copy_payload() and is_coalesce_candidate() instead of a register size and the virtual_grf_sizes array. Among other things, this makes the code more obvious because you don't have to figure out where src_size came from. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/brw_reg: Add a firsthalf function and use it in the generatorJason Ekstrand2014-09-302-29/+44
| | | | | | | | Right now, this function is a no-op but it indicates that we intend to only use the first half of the 16-wide register. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Copy propagate partial reads.Jason Ekstrand2014-09-302-20/+64
| | | | | | | | | | | | | | | | | | This commit reworks copy propagation a bit to support propagating the copying of partial registers. This comes up every time we have pull constants because we do a pull constant read immediately followed by a move to splat the one component of the out to 8 or 16-wide. This allows us to eliminate the copy and simply use the one component of the register. Shader DB results: total instructions in shared programs: 5044937 -> 5044428 (-0.01%) instructions in affected programs: 66112 -> 65603 (-0.77%) GAINED: 0 LOST: 0 Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Refactor fs_inst::is_send_from_grf()Jason Ekstrand2014-09-301-9/+16
| | | | | | | | A switch statement is much easier to read/edit than a big giant or statement. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Clean up emit_fb_writesJason Ekstrand2014-09-302-112/+85
| | | | | | | | | This splits emit_fb_writes into two functions: emit_fb_writes and emit_single_fb_write. This reduces the amount of duplicated code in emit_fb_writes and makes the register number fiddling less arcane. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Print BAD_FILE registers in dump_instructionJason Ekstrand2014-09-301-1/+1
| | | | | | | | Sometimes these show up in LOAD_PAYLOAD instructions and it's nice to be able to see them. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Make compact_virtual_grfs an optimization passJason Ekstrand2014-09-302-8/+13
| | | | | | | | | | | | | | Previously we disabled compact_virtual_grfs when dumping optimizations. The idea here was to make it easier to diff the dumped shader because you didn't have a sudden renaming. However, sometimes a bug is affected by compact_virtual_grfs and, when this happens, you want to keep dumping instructions with compact_virtual_grfs enabled. By turning it into an optimization pass and dumping it along with the others, we retain the ability to diff because you can just diff against the compact_virtual_grf output. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i964/fs: Make immediate fs_reg constructors explicitJason Ekstrand2014-09-304-10/+11
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Make null_reg_* const members of fs_visitor instead of globalsJason Ekstrand2014-09-303-3/+12
| | | | | | | | | We also set the register width equal to the dispatch width. Right now, this is effectively a no-op since we don't do anything with it. However, it will be important once we add an actual width field to fs_reg. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Use the var_from_vgrf helper function instead of doing it manuallyJason Ekstrand2014-09-301-4/+4
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Fix a bug with dead_code_eliminate on large writesJason Ekstrand2014-09-301-1/+1
| | | | | | | | | | | | Previously, if an instruction wrote to more than one register, we implicitly assumed that it filled the entire register. We never hit this before because the only time we did multi-register writes was things like texturing which always wrote to all of the registers. However, with the upcoming ability to do 16-wide instructions in SIMD8 and things of that nature, we can have multi-register writes at offsets and we'll hit this. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Use the UW type for the destination of VARYING_PULL_CONSTANT_LOAD ↵Jason Ekstrand2014-09-301-2/+2
| | | | | | | | | | instructions Using a floating-point type doesn't usually cause hangs on my HSW, but the simulator complains about it quite a bit. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>