mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	nv50: enable txg where supported	Ilia Mirkin	2014-02-25	3	-2/+8
\| \| \| \|	Signed-off-by: Ilia Mirkin <[email protected]>
*	nv50: enable cube map array texture support	Ilia Mirkin	2014-02-25	3	-9/+7
\| \| \| \|	Signed-off-by: Ilia Mirkin <[email protected]>
*	r600g,radeonsi: consolidate create_surface and surface_destroy	Marek Olšák	2014-02-25	6	-85/+63
\| \| \| \|	Reviewed-by: Michel Dänzer <[email protected]>
*	radeonsi: inline util_blitter_copy_texture	Marek Olšák	2014-02-25	1	-3/+21
\| \| \| \| \| \| \| \|	This will be used for changing texture properties without modifying pipe_resource like r600g, but not in this series. For now, this change allows consolidation of pipe_surface functions. Reviewed-by: Michel Dänzer <[email protected]>
*	radeonsi: remove useless psbox variable from resource_copy_region	Marek Olšák	2014-02-25	1	-3/+2
\| \| \| \|	Reviewed-by: Michel Dänzer <[email protected]>
*	radeonsi: compute depth surface registers only once	Marek Olšák	2014-02-25	1	-44/+54
\| \| \| \|	Reviewed-by: Michel Dänzer <[email protected]>
*	radeonsi: compute color surface registers only once	Marek Olšák	2014-02-25	1	-44/+55
\| \| \| \| \| \|	Same as r600g. Reviewed-by: Michel Dänzer <[email protected]>
*	r600g: remove r600_resource.h	Marek Olšák	2014-02-25	5	-48/+15
\| \| \| \|	Reviewed-by: Michel Dänzer <[email protected]>
*	r600g: remove r600_surface::htile_enabled	Marek Olšák	2014-02-25	3	-10/+4
\| \| \| \| \| \|	v2: use one of the htile registers instead Reviewed-by: Michel Dänzer <[email protected]>
*	r600g: use r600_surface::db_z_info	Marek Olšák	2014-02-25	1	-10/+10
\| \| \| \| \| \| \| \| \| \|	db_z_info was unused. This just renames the variable to match the register name. Now, db_depth_info is unused on Evergreen. Both variables will be needed on SI though. Reviewed-by: Michel Dänzer <[email protected]>
*	r600g,radeonsi: share r600_surface	Marek Olšák	2014-02-25	5	-54/+50
\| \| \| \| \| \|	I'm gonna use this in radeonsi. Reviewed-by: Michel Dänzer <[email protected]>
*	radeonsi: move PA_SU_POLY_OFFSET_DB_FMT_CNTL to framebuffer state	Marek Olšák	2014-02-25	1	-8/+21
\| \| \| \| \| \|	It doesn't depend on anything else. Reviewed-by: Michel Dänzer <[email protected]>
*	gallium: the other drivers don't support ARB_buffer_storage	Marek Olšák	2014-02-25	9	-0/+9
\| \| \| \|	Reviewed-by: Fredrik Höglund <[email protected]>
*	r300g,r600g,radeonsi: add support for ARB_buffer_storage	Marek Olšák	2014-02-25	6	-0/+21
\| \| \| \| \| \|	All GTT memory mappings are coherent and therefore can be persistent. Reviewed-by: Fredrik Höglund <[email protected]>
*	gallium: add interface for persistent and coherent buffer mappings	Marek Olšák	2014-02-25	1	-0/+16
\| \| \| \|	Required for ARB_buffer_storage.
*	nv50: correctly calculate the number of vertical blocks during transfer map	Emil Velikov	2014-02-25	1	-1/+1
\| \| \| \| \| \|	Cc: "10.0 10.1" <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	gallium: add texture gather support to gallium (v3)	Dave Airlie	2014-02-25	12	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds support to gallium for a TG4 instruction, and two CAPs. The first CAP is required for GL_ARB_texture_gather. The second CAP is required to expose GL_ARB_gpu_shader5. However so far we haven't found any hardware that natively exposes the textureGatherOffsets feature from GL, so just lower it for now. If hardware appears for this we can add another CAP to allow TG4 to take 4 offsets. v2: add component selection src and a cap to say hw can do it. (st can use to help control GL_ARB_gpu_shader5/GLSL 4.00). Add docs. v3: rename to SM5, add docs. Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
*	clover: Pass buffer offsets to the driver in set_global_binding() v3	Tom Stellard	2014-02-24	2	-1/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The offsets will be stored in the handles parameter. This makes it possible to use sub-buffers. v2: - Style fixes - Add support for constant sub-buffers - Store handles in device byte order v3: - Use endian helpers Reviewed-by: Francisco Jerez <[email protected]>
*	radeonsi: Use SI_BIG_ENDIAN now that it exists	Tom Stellard	2014-02-24	1	-1/+1
\| \| \| \|	Reviewed-by: Michel Dänzer <[email protected]>
*	r600g: Use util_cpu_to_le32() instead of bswap32() on big-endian systems	Tom Stellard	2014-02-24	3	-3/+3
\| \| \| \|	Reviewed-by: Michel Dänzer <[email protected]>
*	radeonsi: Use util_cpu_to_le32() instead of bswap32() on big-endian systems	Tom Stellard	2014-02-24	2	-2/+2
\| \| \| \|	Reviewed-by: Michel Dänzer <[email protected]>
*	freedreno/a3xx/compiler: half-precision output	Rob Clark	2014-02-23	6	-10/+130
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Using generic shaders caused a measurable fps drop, which was isolated to use of full precision (vs half precision) output. This is an attempt to regain that lost performance by using half precision solid/blit shaders (when the output format is not float32). Note: for the built-in shaders, I would not expect them to be register starved. And in fact it is the solid frag shader that seems to have the biggest impact. So I suspect you get double the pixel pipe units (or half the cycles) when the output is half precision. So there may be some gain to using half precision output for application shaders as well, even though the rest of register usage is still full precision. But for half precision to work for more complex shaders, we need to deal with some constraints, like cat2 needing same precision for it's two src registers. So for now it is not enabled by default except for the built-in shaders. Signed-off-by: Rob Clark <[email protected]>
*	freedreno/a3xx: add shader variants	Rob Clark	2014-02-23	10	-196/+283
\| \| \| \| \| \| \| \| \|	Start putting in place infrastructure to deal with multiple shader variants. Initially we'll use this for two sided color (frag) and binning pass (vert) shaders. Possibly need for others later (such as YUV vs RGB eglImage?). Signed-off-by: Rob Clark <[email protected]>
*	freedreno/a3xx/compiler: collapse nop's with repeat	Rob Clark	2014-02-23	2	-0/+15
\| \| \| \| \| \| \| \| \|	Easier than making more extensive use of rpt, and the more compact shaders seem to bring some bit of performance boost. (Perhaps repeat flag benefits are more than just instruction cache, possibly it saves on instruction decode as well?) Signed-off-by: Rob Clark <[email protected]>
*	freedreno/a3xx: drop hand-coded blit/solid shaders	Rob Clark	2014-02-23	10	-287/+181
\| \| \| \| \| \| \| \| \|	Instead in the common code, construct these shaders from TGSI. For now we let a2xx keep it's hand coded shaders, as it's compiler isn't quite up to the job yet. All the same it is a net drop in code size and gets rid of special cases. Signed-off-by: Rob Clark <[email protected]>
*	freedreno/lowering: cleanup api	Rob Clark	2014-02-23	5	-24/+138
\| \| \| \| \| \| \| \|	Make things configurable, and tweak the API a bit to avoid an extra tgsi_shader_scan(). Getting closer to something generic which can be moved out of freedreno and shaderd by other drivers. Signed-off-by: Rob Clark <[email protected]>
*	freedreno/a3xx: add float 16 and 32bit formats	Rob Clark	2014-02-23	1	-0/+22
\| \| \| \|	Signed-off-by: Rob Clark <[email protected]>
*	freedreno: resync generated headers	Rob Clark	2014-02-23	4	-4/+20
\| \| \| \|	Signed-off-by: Rob Clark <[email protected]>
*	nv50: make sure to clear _all_ layers of all attachments	Ilia Mirkin	2014-02-22	3	-3/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unfortunately there's only one RT_ARRAY_MODE setting for all attachments, so clears were previously truncated to the minimum number of layers any attachment had. Instead set the RT_ARRAY_MODE to 512 (the max number of layers) before doing the clear. This fixes gl-3.2-layered-rendering-clear-color-mismatched-layer-count. Also fix clears of individual layered rt/zeta, in case it ever happens. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Christoph Bumiller <[email protected]> Cc: 10.1 <[email protected]>
*	ilo: fix and enable fast depth clear	Chia-I Wu	2014-02-22	2	-9/+38
\| \| \| \| \| \| \| \|	Use tex->bo_format instead of zs->format in ilo_blitter_rectlist_clear_zs() because the latter may be combined depth/stencil format. hiz_can_clear_zs() is no-op for GEN7+, but move the GEN check so that the assertions are tested. Finally, call the fast depth clear function from ilo_clear().
*	ilo: add slice clear value	Chia-I Wu	2014-02-22	5	-7/+78
\| \| \| \| \|	It is needed for 3DSTATE_CLEAR_PARAMS, and can also be used to track what value the slice has been cleared to.
*	ilo: better readability and doc for texture flags	Chia-I Wu	2014-02-22	3	-36/+58
\| \| \| \| \|	Improve comments for the flags, and explicitly separate their uses in slice flags and resolve flags.
*	ilo: fix for stencil only rectlist ops	Chia-I Wu	2014-02-22	2	-2/+8
\| \| \| \| \|	3DSTATE_STENCIL_BUFFER inherits some states from 3DSTATE_DEPTH_BUFFER. We need to emit both even the surface is stencil only.
*	ilo: fix a false assertion failure on GEN6	Chia-I Wu	2014-02-22	1	-4/+12
\| \| \| \|	Layer offsetting is possible when it is level 0, layer 0.
*	ilo: pipe_texture::usage is not a bitfield	Chia-I Wu	2014-02-22	1	-1/+1
\| \| \| \|	It happens to work because PIPE_USAGE_STAGING is 0x100.
*	ilo: set ILO_TEXTURE_CPU_WRITE for imported textures	Chia-I Wu	2014-02-22	1	-3/+10
\| \| \| \| \|	Assume the bo has been written by another process, which will trigger a HiZ resolve.
*	nv50/ir/ra: fix SpillCodeInserter::offsetSlot usage	Christoph Bumiller	2014-02-22	1	-7/+7
\| \| \| \| \| \|	We were turning non-memory spill slots into NULL. Cc: 10.1 <[email protected]>
*	freedreno: tweak ringbuffer sizes/count	Rob Clark	2014-02-19	2	-2/+2
\| \| \| \| \| \| \| \|	Since we are now consuming two ringbuffers at a time, we probably want a pool larger than 4.. but we don't need each individual ringbuffer to be so large, so offset the pool size increase by reducing rb size. Signed-off-by: Rob Clark <[email protected]>
*	freedreno/a3xx/compiler: scheduling/legalize fixes	Rob Clark	2014-02-19	3	-2/+30
\| \| \| \| \| \| \| \| \| \|	It seems the write-after-read hazard that applies to texture fetch instructions, also applies to sfu instructions. Also, cat5/cat6 instructions do not have a (ss) bit, so in these cases we need to insert a dummy nop instruction with (ss) bit set. Signed-off-by: Rob Clark <[email protected]>
*	r600g,radeonsi: Consolidate logic for short-circuiting flushes	Michel Dänzer	2014-02-18	6	-6/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes radeonsi emitting command streams to the kernel even when there have been no draw calls before a flush, potentially powering up the GPU needlessly. Incidentally, this also cuts the runtime of piglit gpu.py in about half on my Kaveri system, probably because an X11 client going away no longer always results in a command stream being submitted to the kernel via glamor. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=65761 Cc: "10.1" [email protected] Reviewed-by: Marek Olšák <[email protected]>
*	freedreno/a3xx/compiler: use (ss) for WAR hazards	Rob Clark	2014-02-16	1	-2/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Seems texture sample instructions don't immediately consume there src(s). In fact, some shaders from blob compiler seem to indiciate that it does not even count the texture sample instructions when calculating number of delay slots to fill for non-sample instructions. (Although so far it seems inconclusive as to whether this is required.) In particular, when a src register of a previous texture sample instruction is clobbered, the (ss) bit is needed to synchronize with the tex pipeline to ensure it has picked up the previous values before they are overwritten. Signed-off-by: Rob Clark <[email protected]>
*	freedreno/a3xx/compiler: fix RA typo	Rob Clark	2014-02-16	2	-4/+4
\| \| \| \| \| \| \| \| \| \|	Was supposed to be a '+', otherwise we end up with a negative offset and choosing registers below the assigned range. This seems to fix the scheduling mystery "solved" by adding in extra delay slots. Signed-off-by: Rob Clark <[email protected]>
*	freedreno/a3xx/compiler: handle kill properly (new compiler)	Rob Clark	2014-02-16	4	-26/+105
\| \| \| \| \| \| \| \| \| \| \| \| \|	Since 'kill' does not produce a result, the new compiler was happily optimizing them out. We need to instead track 'kill's similar to outputs. But since there is no non-predicated kill instruction, (and for flattend if/else we do want them to be predicated), we need to track the topmost branch condition on the stack and use that as src arg to the kill. For a kill at the topmost level, we have to generate an immediate 1.0 to feed into the cmps.f for setting the predicate register. Signed-off-by: Rob Clark <[email protected]>
*	freedreno/a3xx/compiler: trans_cmp() sanity	Rob Clark	2014-02-16	1	-51/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Thanks to figuring out 32bit float render target, and adding regdump test in fdre-a3xx, I can more easily play around with instructions to figure out range of inputs/outputs/etc. And from this I can conclude that cmps.f works more like expected and I can do something much more simple in trans_cmp() (compared to before which was more closely emulating the instruction sequence of the blob compiler). And using sel.b32 (binary 0/1) often makes more sense than sel.f32 (+/- float) or sel.u32 (+/- uint) as it can use the output directly from cmps.f without needing the 'add.s tmp0, tmp0, -1'. Signed-off-by: Rob Clark <[email protected]>
*	freedreno: fix problems if no color buf bound	Rob Clark	2014-02-16	2	-2/+7
\| \| \| \|	Signed-off-by: Rob Clark <[email protected]>
*	svga/winsys: Propagate surface shared information to the winsys	Thomas Hellstrom	2014-02-14	2	-2/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The linux winsys needs to know whether a surface is shared. For guest-backed surfaces we need this information to avoid allocating a mob out of the mob cache for shared surfaces, but instead allocate a shared mob, that is never put in the mob cache, from the kernel. Also previously, all surfaces were given the "shareable" attribute when allocated from the kernel. This is too permissive for client-local surfaces. Now that we have the needed info, only set the "shareable" attribute if the client indicates that it needs to share the surface. Signed-off-by: Thomas Hellstrom <[email protected]> Reviewed-by: Jakob Bornecrantz <[email protected]> Reviewed-by: Brian Paul <[email protected]> Cc: "10.1" <[email protected]>
*	svga: update texture code for GBS	Brian Paul	2014-02-14	2	-64/+326
\| \| \| \| \|	Reviewed-by: Thomas Hellstrom <[email protected]> Cc: "10.1" <[email protected]>
*	svga: update buffer code for GBS	Brian Paul	2014-02-14	2	-42/+224
\| \| \| \| \|	Reviewed-by: Thomas Hellstrom <[email protected]> Cc: "10.1" <[email protected]>
*	svga: add new helper functions for GBS buffers	Brian Paul	2014-02-14	1	-0/+76
\| \| \| \| \|	Reviewed-by: Thomas Hellstrom <[email protected]> Cc: "10.1" <[email protected]>
*	svga: remove a couple unneeded assertions	Brian Paul	2014-02-14	2	-2/+0
\| \| \| \| \|	Reviewed-by: Thomas Hellstrom <[email protected]> Cc: "10.1" <[email protected]>