summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers
Commit message (Collapse)AuthorAgeFilesLines
* freedreno/a3xx/compiler: overflow in trans_endifRob Clark2014-03-021-13/+5
| | | | | | | | The logic to count number of block outputs was out of sync with the actual array construction. But to simplify / make things less fragile, we can just allocate the arrays for worst case size. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx/compiler: fix for resolving PHI'sRob Clark2014-03-021-18/+33
| | | | | | | A value may be assigned on only one side of an if/else. In this case we can simply substitute a mov.f32f32. Signed-off-by: Rob Clark <[email protected]>
* freedreno/lowering: two-sided-colorRob Clark2014-03-025-57/+269
| | | | | | | | | Add option to generate fragment shader to emulate two sided color. Additional inputs are added to shader for BCOLOR's (on corresponding to each COLOR input). CMP instructions are used to select whether to use COLOR or BCOLOR. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx/compiler: add SSGRob Clark2014-03-022-0/+2
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: fix gl_PointSizeRob Clark2014-03-025-13/+20
| | | | | | | If vertex writes pointsize, there are a few extra bits we need to turn on in the cmdstream here and there. Signed-off-by: Rob Clark <[email protected]>
* freedreno: resync generated headersRob Clark2014-03-024-5/+7
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: binning-pass vertex shader variantRob Clark2014-03-023-18/+57
| | | | | | | | | | | Now that we have the infrastructure for shader variants, add support to generate an optimized shader for hw binning pass (with varyings/outputs other than position/pointsize removed). This exposes the possibility that the shader uses fewer constants than what is bound, so we have to take care to not emit consts beyond what the shader uses, lest we provoke the wrath of the HLSQ lockup! Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: add support for frag coord/faceRob Clark2014-03-027-123/+408
| | | | | | | Fixes anything that tries to use gl_FrontFacing/gl_FragCoord. Also, face support is needed to emulate two sided color. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: fix for unused inputsRob Clark2014-03-022-5/+11
| | | | | | | An unused input might not have a register assigned. We don't want bogus regid to result in impossibly high max_reg.. Signed-off-by: Rob Clark <[email protected]>
* r300g/tests: Added missing fclose for FILE resource.Siavash Eliasi2014-02-281-0/+3
| | | | | Reviewed-by: Tom Stellard <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* r600g/compute: PIPE_CAP_COMPUTE should be false for pre-evergreen GPUsTom Stellard2014-02-281-1/+3
| | | | | | | | | This prevents clover from using unsupported devices. Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Alex Deucher <[email protected]> CC: "10.0 10.1" <[email protected]>
* nouveau: add a nouveau_compiler binary to compile TGSI into shader ISAIlia Mirkin2014-02-263-0/+235
| | | | | | | This makes it easy to compare output between different cards, especially for ones that you don't have (and/or not in the current machine). Signed-off-by: Ilia Mirkin <[email protected]>
* nv30: remove nv30_context use from nvfx_*progIlia Mirkin2014-02-2613-107/+110
| | | | | | | This should pave the way to being able to use the compiler without a context. Also leads to cleaner code. Signed-off-by: Ilia Mirkin <[email protected]>
* nv30: remove unused sprite flipping parameterIlia Mirkin2014-02-263-5/+3
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nv30: remove unused render_mode and hw_pointsprite_controlIlia Mirkin2014-02-262-8/+3
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nv30: remove use_nv4x, it is identical to is_nv4xIlia Mirkin2014-02-264-16/+14
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* radeonsi: Prevent geometry shader from emitting too many verticesMichel Daenzer2014-02-271-0/+16
|
* ilo: create u_upload_mgr lastChia-I Wu2014-02-261-8/+11
| | | | | Similar to u_blitter, u_upload_mgr is now a client of the pipe context. Its creation needs to be delayed until the context has been (almost) initialized.
* nv50: enable txg where supportedIlia Mirkin2014-02-253-2/+8
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nv50: enable cube map array texture supportIlia Mirkin2014-02-253-9/+7
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* r600g,radeonsi: consolidate create_surface and surface_destroyMarek Olšák2014-02-256-85/+63
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: inline util_blitter_copy_textureMarek Olšák2014-02-251-3/+21
| | | | | | | | This will be used for changing texture properties without modifying pipe_resource like r600g, but not in this series. For now, this change allows consolidation of pipe_surface functions. Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: remove useless psbox variable from resource_copy_regionMarek Olšák2014-02-251-3/+2
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: compute depth surface registers only onceMarek Olšák2014-02-251-44/+54
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: compute color surface registers only onceMarek Olšák2014-02-251-44/+55
| | | | | | Same as r600g. Reviewed-by: Michel Dänzer <[email protected]>
* r600g: remove r600_resource.hMarek Olšák2014-02-255-48/+15
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* r600g: remove r600_surface::htile_enabledMarek Olšák2014-02-253-10/+4
| | | | | | v2: use one of the htile registers instead Reviewed-by: Michel Dänzer <[email protected]>
* r600g: use r600_surface::db_z_infoMarek Olšák2014-02-251-10/+10
| | | | | | | | | | db_z_info was unused. This just renames the variable to match the register name. Now, db_depth_info is unused on Evergreen. Both variables will be needed on SI though. Reviewed-by: Michel Dänzer <[email protected]>
* r600g,radeonsi: share r600_surfaceMarek Olšák2014-02-255-54/+50
| | | | | | I'm gonna use this in radeonsi. Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: move PA_SU_POLY_OFFSET_DB_FMT_CNTL to framebuffer stateMarek Olšák2014-02-251-8/+21
| | | | | | It doesn't depend on anything else. Reviewed-by: Michel Dänzer <[email protected]>
* gallium: the other drivers don't support ARB_buffer_storageMarek Olšák2014-02-259-0/+9
| | | | Reviewed-by: Fredrik Höglund <[email protected]>
* r300g,r600g,radeonsi: add support for ARB_buffer_storageMarek Olšák2014-02-256-0/+21
| | | | | | All GTT memory mappings are coherent and therefore can be persistent. Reviewed-by: Fredrik Höglund <[email protected]>
* gallium: add interface for persistent and coherent buffer mappingsMarek Olšák2014-02-251-0/+16
| | | | Required for ARB_buffer_storage.
* nv50: correctly calculate the number of vertical blocks during transfer mapEmil Velikov2014-02-251-1/+1
| | | | | | Cc: "10.0 10.1" <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* gallium: add texture gather support to gallium (v3)Dave Airlie2014-02-2512-0/+24
| | | | | | | | | | | | | | | | | | | | | This adds support to gallium for a TG4 instruction, and two CAPs. The first CAP is required for GL_ARB_texture_gather. The second CAP is required to expose GL_ARB_gpu_shader5. However so far we haven't found any hardware that natively exposes the textureGatherOffsets feature from GL, so just lower it for now. If hardware appears for this we can add another CAP to allow TG4 to take 4 offsets. v2: add component selection src and a cap to say hw can do it. (st can use to help control GL_ARB_gpu_shader5/GLSL 4.00). Add docs. v3: rename to SM5, add docs. Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* clover: Pass buffer offsets to the driver in set_global_binding() v3Tom Stellard2014-02-242-1/+10
| | | | | | | | | | | | | | | The offsets will be stored in the handles parameter. This makes it possible to use sub-buffers. v2: - Style fixes - Add support for constant sub-buffers - Store handles in device byte order v3: - Use endian helpers Reviewed-by: Francisco Jerez <[email protected]>
* radeonsi: Use SI_BIG_ENDIAN now that it existsTom Stellard2014-02-241-1/+1
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* r600g: Use util_cpu_to_le32() instead of bswap32() on big-endian systemsTom Stellard2014-02-243-3/+3
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: Use util_cpu_to_le32() instead of bswap32() on big-endian systemsTom Stellard2014-02-242-2/+2
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* freedreno/a3xx/compiler: half-precision outputRob Clark2014-02-236-10/+130
| | | | | | | | | | | | | | | | | | | | Using generic shaders caused a measurable fps drop, which was isolated to use of full precision (vs half precision) output. This is an attempt to regain that lost performance by using half precision solid/blit shaders (when the output format is not float32). Note: for the built-in shaders, I would not expect them to be register starved. And in fact it is the solid frag shader that seems to have the biggest impact. So I suspect you get double the pixel pipe units (or half the cycles) when the output is half precision. So there may be some gain to using half precision output for application shaders as well, even though the rest of register usage is still full precision. But for half precision to work for more complex shaders, we need to deal with some constraints, like cat2 needing same precision for it's two src registers. So for now it is not enabled by default except for the built-in shaders. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: add shader variantsRob Clark2014-02-2310-196/+283
| | | | | | | | | Start putting in place infrastructure to deal with multiple shader variants. Initially we'll use this for two sided color (frag) and binning pass (vert) shaders. Possibly need for others later (such as YUV vs RGB eglImage?). Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx/compiler: collapse nop's with repeatRob Clark2014-02-232-0/+15
| | | | | | | | | Easier than making more extensive use of rpt, and the more compact shaders seem to bring some bit of performance boost. (Perhaps repeat flag benefits are more than just instruction cache, possibly it saves on instruction decode as well?) Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: drop hand-coded blit/solid shadersRob Clark2014-02-2310-287/+181
| | | | | | | | | Instead in the common code, construct these shaders from TGSI. For now we let a2xx keep it's hand coded shaders, as it's compiler isn't quite up to the job yet. All the same it is a net drop in code size and gets rid of special cases. Signed-off-by: Rob Clark <[email protected]>
* freedreno/lowering: cleanup apiRob Clark2014-02-235-24/+138
| | | | | | | | Make things configurable, and tweak the API a bit to avoid an extra tgsi_shader_scan(). Getting closer to something generic which can be moved out of freedreno and shaderd by other drivers. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: add float 16 and 32bit formatsRob Clark2014-02-231-0/+22
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: resync generated headersRob Clark2014-02-234-4/+20
| | | | Signed-off-by: Rob Clark <[email protected]>
* nv50: make sure to clear _all_ layers of all attachmentsIlia Mirkin2014-02-223-3/+21
| | | | | | | | | | | | | | | Unfortunately there's only one RT_ARRAY_MODE setting for all attachments, so clears were previously truncated to the minimum number of layers any attachment had. Instead set the RT_ARRAY_MODE to 512 (the max number of layers) before doing the clear. This fixes gl-3.2-layered-rendering-clear-color-mismatched-layer-count. Also fix clears of individual layered rt/zeta, in case it ever happens. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Christoph Bumiller <[email protected]> Cc: 10.1 <[email protected]>
* ilo: fix and enable fast depth clearChia-I Wu2014-02-222-9/+38
| | | | | | | | Use tex->bo_format instead of zs->format in ilo_blitter_rectlist_clear_zs() because the latter may be combined depth/stencil format. hiz_can_clear_zs() is no-op for GEN7+, but move the GEN check so that the assertions are tested. Finally, call the fast depth clear function from ilo_clear().
* ilo: add slice clear valueChia-I Wu2014-02-225-7/+78
| | | | | It is needed for 3DSTATE_CLEAR_PARAMS, and can also be used to track what value the slice has been cleared to.
* ilo: better readability and doc for texture flagsChia-I Wu2014-02-223-36/+58
| | | | | Improve comments for the flags, and explicitly separate their uses in slice flags and resolve flags.