summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* nir: Allow outputs reads and add the relevant intrinsics.Kenneth Graunke2015-11-134-8/+21
| | | | | | | | | | | | | | | | | | | | | | | | | Normally, we rely on nir_lower_outputs_to_temporaries to create shadow variables for outputs, buffering the results and writing them all out at the end of the program. However, this is infeasible for tessellation control shader outputs. Tessellation control shaders can generate multiple output vertices, and write per-vertex outputs. These are arrays indexed by the vertex number; each thread only writes one element, but can read any other element - including those being concurrently written by other threads. The barrier() intrinsic synchronizes between threads. Even if we tried to shadow every output element (which is of dubious value), we'd have to read updated values in at barrier() time, which means we need to allow output reads. Most stages should continue using nir_lower_outputs_to_temporaries(), but in theory drivers could choose not to if they really wanted. v2: Rebase to accomodate Jason's review feedback. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir/lower_io: Introduce nir_store_per_vertex_output intrinsics.Kenneth Graunke2015-11-133-5/+26
| | | | | | | | | | | Similar to nir_load_per_vertex_input, but for outputs. This is not useful in geometry shaders, but will be useful in tessellation shaders. v2: Change stage_uses_per_vertex_outputs() to is_per_vertex_output(), taking a nir_variable (requested by Jason Ekstrand). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir/lower_io: Use load_per_vertex_input intrinsics for TCS and TES.Kenneth Graunke2015-11-131-4/+8
| | | | | | | | | | | | | | | | Tessellation control shader inputs are an array indexed by the vertex number, like geometry shader inputs. There aren't per-patch TCS inputs. Tessellation evaluation shaders have both per-vertex and per-patch inputs. Per-vertex inputs get the new intrinsics; per-patch inputs continue to use the ordinary load_input intrinsics, as they already work like we want them to. v2: Change stage_uses_per_vertex_inputs into is_per_vertex_input(), which takes a variable (requested by Jason Ekstrand). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Silence unused parameter warnings in get_buffer_rectIan Romanick2015-11-131-4/+3
| | | | | | | | | | | | | brw_meta_fast_clear.c: In function 'get_buffer_rect': brw_meta_fast_clear.c:318:37: warning: unused parameter 'brw' [-Wunused-parameter] get_buffer_rect(struct brw_context *brw, struct gl_framebuffer *fb, ^ brw_meta_fast_clear.c:319:44: warning: unused parameter 'irb' [-Wunused-parameter] struct intel_renderbuffer *irb, struct rect *rect) ^ Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* meta/generate_mipmap: Don't leak the sampler objectIan Romanick2015-11-131-0/+2
| | | | | | Signed-off-by: Ian Romanick <[email protected]> Cc: "10.6 11.0" <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Remove unneeded #includes.Matt Turner2015-11-131-4/+0
| | | | | Some of these are no longer needed since all the backends switched to NIR.
* i965: Silence warning.Matt Turner2015-11-131-2/+2
| | | | | | | | | | | | | | intel_asm_annotation.c: In function ‘annotation_insert_error’: intel_asm_annotation.c:214:18: warning: ‘ann’ may be used uninitialized in this function [-Wmaybe-uninitialized] ann->error = ralloc_strdup(annotation->mem_ctx, error); ^ I initially tried changing the type of ann_count to unsigned (is currently int), since that in addition to the check that it's non-zero at the beginning of the function seems sufficient to prove that it must be greater than zero. Unfortunately that wasn't sufficient.
* i965: Don't write beyond allocated memory.Juha-Pekka Heikkila2015-11-131-1/+1
| | | | | | Reviewed-by: Eduardo Lima Mitev <[email protected]> Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Juha-Pekka Heikkila <[email protected]>
* i965: Use BRW_MRF_COMPR4 macro in more places.Matt Turner2015-11-134-6/+6
| | | | | Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Combine register file field.Matt Turner2015-11-137-34/+27
| | | | | | | | The first four values (2-bits) are hardware values, and VGRF, ATTR, and UNIFORM remain values used in the IR. Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Replace HW_REG with ARF/FIXED_GRF.Matt Turner2015-11-1313-211/+157
| | | | | | | | | | | | | | | | HW_REGs are (were!) kind of awful. If the file was HW_REG, you had to look at different fields for type, abs, negate, writemask, swizzle, and a second file. They also caused annoying problems like immediate sources being considered scheduling barriers (commit 6148e94e2) and other such nonsense. Instead use ARF/FIXED_GRF/MRF for fixed registers in those files. After a sufficient amount of time has passed since "GRF" was used, we can rename FIXED_GRF -> GRF, but doing so now would make rebasing awful. Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Set stride correctly for immediates in fs_reg(brw_reg).Matt Turner2015-11-131-0/+6
| | | | | | | | | | | | | | | | | | The fs_reg() constructors for immediates set stride to 0, except for vector-immediates, which set stride to 1. This patch makes the fs_reg constructor that takes a brw_reg do likewise, so that stride is set correctly for cases such as fs_reg(brw_imm_v(...)). The generator asserts that this is true (and presumably it's useful in some optimization passes?) and the VF fs_reg constructors did this (by virtue of the fact that it doesn't override what init() does). In the next commit, calling this constructor with brw_imm_* will generate an IMM file register rather than a HW_REG, making this change necessary to avoid breakage with existing uses of brw_imm_v(). Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Handle type-V immediates in brw_reg_from_fs_reg().Matt Turner2015-11-131-0/+3
| | | | | | | | | We use brw_imm_v() to produce type-V immediates, which generates a brw_reg with fs_reg's .file set to HW_REG. The next commit will rid us of HW_REGs, so we need to handle BRW_REGISTER_TYPE_V in the IMM case. Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Rename GRF to VGRF.Matt Turner2015-11-1330-194/+194
| | | | | | | | | | The 2-bit hardware register file field is ARF, GRF, MRF, IMM. Rename GRF to VGRF (virtual GRF) so that we can reuse the GRF name to mean an assigned general purpose register. Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Move BAD_FILE from the beginning of enum register_file.Matt Turner2015-11-131-1/+1
| | | | | | | | | I'm going to begin using brw_reg's file field in backend_reg and its derivatives, and in order to keep the hardware value for ARF as 0, we have to do something different. Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Initialize registers.Matt Turner2015-11-133-2/+18
| | | | | | | | | | | | The test (file == BAD_FILE) works on registers for which the constructor has not run because BAD_FILE is zero. The next commit will move BAD_FILE in the enum so that it's no longer zero. In the case of this->outputs, the constructor was being run implicitly, and we were unnecessarily memsetting is to zero. Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Use brw_reg's nr field to store register number.Matt Turner2015-11-1325-290/+276
| | | | | | | | | | | | In addition to combining another field, we get replace silliness like "reg.reg" with something that actually makes sense, "reg.nr"; and no one will ever wonder again why dst.reg isn't a dst_reg. Moving the now 16-bit nr field to a 16-bit boundary decreases code size by about 3k. Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Unwrap some lines.Matt Turner2015-11-134-12/+4
| | | | | Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: Remove swizzle/writemask fields from src/dst_reg.Matt Turner2015-11-132-8/+1
| | | | | | | | Also allows us to handle HW_REGs in the swizzle() and writemask() functions. Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Remove fixed_hw_reg field from backend_reg.Matt Turner2015-11-1311-162/+139
| | | | | | | | Since backend_reg now inherits brw_reg, we can use it in place of the fixed_hw_reg field. Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Use immediate storage in inherited brw_reg.Matt Turner2015-11-1310-95/+96
| | | | | Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add and use enum brw_reg_file.Matt Turner2015-11-134-19/+23
| | | | | Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Reorganize brw_reg fields.Matt Turner2015-11-131-8/+8
| | | | | | | | | Put fields that are meaningless with an immediate in the same storage with the immediate. This leaves fields type, file, nr, subnr in the first dword where there's now extra room for expansion. Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Make 'dw1' and 'bits' unnamed structures in brw_reg.Matt Turner2015-11-1314-178/+185
| | | | | | | | | | | | | | | | | | | | | | Generated by sed -i -e 's/\.bits\././g' *.c *.h *.cpp sed -i -e 's/dw1\.//g' *.c *.h *.cpp and then reverting changes to comments in gen7_blorp.cpp and brw_fs_generator.cpp. There wasn't any utility offered by forcing the programmer to list these to access their fields. Removing them will reduce churn in future commits. This is C11 (and gcc has apparently supported it for sometime "compatibility with other compilers") See https://gcc.gnu.org/onlinedocs/gcc/Unnamed-Fields.html Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Delete type field from backend_reg.Matt Turner2015-11-131-1/+0
| | | | | | | | | Switching from an implicitly-sized type field to field with an explicit bit width is safe because we have fewer than 2^4 types, and gcc will warn if you attempt to set a value that will not fit. Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Delete abs/negate fields from backend_reg.Matt Turner2015-11-133-5/+2
| | | | | | | | Instead use the ones provided by brw_reg. Also allows us to handle HW_REGs in the negate() functions. Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Make backend_reg inherit from brw_reg.Matt Turner2015-11-131-3/+3
| | | | | | | | Some fields (file, type, abs, negate) in brw_reg are shadowed by backend_reg. Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Replace nested ternary with if ladder.Matt Turner2015-11-131-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | Since the types of the expression were bool ? src_reg : (bool ? brw_reg : brw_reg) the result of the second (nested) ternary would be implicitly converted to a src_reg by the src_reg(struct brw_reg) constructor. I.e., bool ? src_reg : src_reg(bool ? brw_reg : brw_reg) In the next patch, I make backend_reg (the parent of src_reg) inherit from brw_reg, which changes this expression to return brw_reg, which throws away any fields that exist in the classes derived from brw_reg. I.e., src_reg(bool ? brw_reg(src_reg) : bool ? brw_reg : brw_reg) Generally this code was gross, and wasn't actually shorter or easier to read than an if ladder. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* radeonsi: remove dead code after ES-GS linkage changeMarek Olšák2015-11-133-57/+0
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: link ES-GS just like LS-HSMarek Olšák2015-11-133-39/+19
| | | | | | | | | | | | This reduces the shader key for ES. Use a fixed attrib location based on (semantic name, index). The ESGS item size is determined by the physical index of the highest ES output, so it's almost always larger than before, but I think that shouldn't matter as long as the ESGS ring buffer is large enough. Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: calculate optimal GS ring sizes to fix GS hangs on TongaMarek Olšák2015-11-135-47/+113
| | | | | | | | | | | | | | I discovered that increasing the ESGS ring size fixes GS hangs on Tonga, so let's do it properly. There is now a separate init_config_gs_rings state that is not immutable, because GS rings are resized when needed. This also saves some memory. Most apps won't need more than 1MB per ring per shader engine. Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: rename si_update_gs_ringsMarek Olšák2015-11-131-2/+2
| | | | | Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: calculate ESGS_RING_ITEMSIZE in create_shaderMarek Olšák2015-11-132-1/+3
| | | | | Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move maximum gs stream calculation into create_shaderMarek Olšák2015-11-132-16/+7
| | | | | Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: clean up small duplication in si_shader_gsMarek Olšák2015-11-132-6/+8
| | | | | Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: shorten render_cond variable namesMarek Olšák2015-11-135-13/+13
| | | | | | and ..._cond -> ..._invert Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: remove predicate_drawing flagMarek Olšák2015-11-134-4/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: atomize render condition (SET_PREDICATION)Marek Olšák2015-11-1310-45/+45
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: simplify restoring render condition after flushMarek Olšák2015-11-133-26/+5
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: don't use PREDICATION_OP_CLEARMarek Olšák2015-11-131-36/+24
| | | | | | Not setting the predication bit is sufficient. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: simplify disabling render condition for u_blitterMarek Olšák2015-11-135-23/+22
| | | | | | just disable it by not setting the predication bit Reviewed-by: Nicolai Hähnle <[email protected]>
* r600g: don't set predication on non-draw packetsMarek Olšák2015-11-131-8/+8
| | | | | | This has no effect. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: inline the r600_rings structureMarek Olšák2015-11-1324-266/+262
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: prevent recursion in si_context_gfx_flushMarek Olšák2015-11-132-0/+8
| | | | | | The recursion can only occur if you modify need_cs_space to always flush. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: remove the IB flushing flagMarek Olšák2015-11-134-14/+2
| | | | | | | Not needed anymore. A similar flag will be introduced in the next commit, which will be private in radeonsi. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: move GFX/DMA flushing from add_to_buffer_list to need_cs_spaceMarek Olšák2015-11-134-15/+14
| | | | | | | | need_cs_space isn't invoked so often and is called before all commands too. This is a lot cleaner. The code in radeon_add_to_buffer_list always seemed dodgy to me. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: rename cache flushing flags once moreMarek Olšák2015-11-137-35/+30
| | | | | | | | | | | | | | | KCACHE, TC L1 and TC L2 are renamed to: - SMEM L1 - VMEM L1 - GLOBAL L2 You can easily tell what they are used for now. Shaders must deal with coherency issues between both L1s manually, e.g. by setting GLC=1 or by using s_dcache_*. BOTH_ICACHE_KCACHE was an unused definition. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: set the DISABLE_WR_CONFIRM flag on CI-VI as wellMarek Olšák2015-11-131-2/+2
| | | | | | | | I missed this in commit c3e527f93d4281ad6e2ca165eaf6ff588e4faefa radeonsi: only enable write confirmation on the last CP DMA packet Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: initialize SX_PS_DOWNCONVERT to 0 on StoneyMarek Olšák2015-11-131-0/+3
| | | | | | | otherwise the SX or CB blocks can go bananas Reviewed-by: Nicolai Hähnle <[email protected]> Cc: [email protected]
* radeonsi: add glClearBufferSubData accelerationMarek Olšák2015-11-131-0/+60
| | | | | | 8-bit and 16-bit clears which are not aligned to dwords are done in software. Reviewed-by: Nicolai Hähnle <[email protected]>