aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_context.h
Commit message (Collapse)AuthorAgeFilesLines
* i965: Pack the tracked state atoms into separate arrays for prepare/emit.Chris Wilson2011-03-091-0/+3
| | | | | | Improves performance of a hacked-up scissor-many (to reuse a small set of scissors instead of blowing out the cache, and then to run 100x more iterations so it actually took some time) by 3.6% +/- 1.2% (n=10)
* i965: Align index to type size and flush if the type changesChris Wilson2011-03-041-2/+2
| | | | Signed-off-by: Chris Wilson <[email protected]>
* i965: Use negative relocation deltas to minimse vertex uploadsChris Wilson2011-03-011-1/+1
| | | | | | | | | | | | With relaxed relocation checking in the kernel, we can specify a negative delta (i.e. pointing outside of the target bo) in order to fake a range in a large buffer. We only then need to upload the elements used and adjust the buffer offset such that they correspond with the indices used in the DrawArrays. (Depends on libdrm 0209428b3918c4336018da9293cdcbf7f8fedfb6) Signed-off-by: Chris Wilson <[email protected]>
* i965: Upload all vertices usedChris Wilson2011-03-011-2/+0
| | | | | | | | | | ... and take advantage of start_vertex_bias to trim to [min_index, max_index] where possible (i.e. when we need to upload all arrays). Fixes half_float_vertex(misc.fillmode.wireframe) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34595 Signed-off-by: Chris Wilson <[email protected]>
* i965: Remove unused 'next_free_page' memberChris Wilson2011-02-211-5/+0
| | | | Signed-off-by: Chris Wilson <[email protected]>
* intel: extend current vertex buffersChris Wilson2011-02-211-2/+12
| | | | | | | | | If the next vertex arrays are a (discontiguous) continuation of the current arrays, such that the new vertices are simply offset from the start of the current vertex buffer definitions we can reuse those defintions and avoid the overhead of relocations and invalidations. Signed-off-by: Chris Wilson <[email protected]>
* i965: emit one vb packet per vboChris Wilson2011-02-211-4/+11
| | | | | | | Track reuse of the vertex buffer objects and so minimise the number of vertex buffers used by the hardware (and their relocations). Signed-off-by: Chris Wilson <[email protected]>
* i965: upload transient indices into the same discontiguous bufferChris Wilson2011-02-211-1/+0
| | | | | | | | As we now pack the indices into a common upload buffer, we can reuse a single CMD_INDEX_BUFFER packet and translate each invocation with a start vertex offset. Signed-off-by: Chris Wilson <[email protected]>
* i965: drop state_bo references to batch_boChris Wilson2011-02-211-5/+0
| | | | | | | As we use state relocations and we know that all the state belongs to the same bo, we can drop the multiple references to the same bo. Signed-off-by: Chris Wilson <[email protected]>
* i965: Combine vb upload buffer with the general upload bufferChris Wilson2011-02-211-8/+0
| | | | | | | Reuse the new common upload buffer for uploading temporary indices and rebuilt vertex arrays. Signed-off-by: Chris Wilson <[email protected]>
* i965: Separate the BRW_NEW_(VS|WM)_CONSTBUF dirty bits.Kenneth Graunke2011-02-081-1/+1
| | | | | These were incorrectly defined to the same value - likely due to a cut and paste error. Found by inspection.
* i965: Drop the dead tracking of color_regions[].Eric Anholt2011-02-041-2/+0
| | | | We pull the draw regions right out of the renderbuffers these days.
* i965: Nuke brw_wm_glsl.c.Eric Anholt2010-12-061-1/+1
| | | | | | | | | | It was only used for gen6 fragment programs (not GLSL shaders) at this point, and it was clearly unsuited to the task -- missing opcodes, corrupted texturing, and assertion failures hit various applications of all sorts. It was easier to patch up the non-glsl for remaining gen6 changes than to make brw_wm_glsl.c complete. Bug #30530
* i965: Make FS uniforms be the actual type of the uniform at upload time.Eric Anholt2010-10-271-2/+40
| | | | | | | | This fixes some insanity that would otherwise be required for GLSL 1.30 bit ops or gen6 integer uniform operations in general, at the cost of upload-time pain. Given that we only have that pain because mesa's mangling our integer uniforms to be floats, this something that should be fixed outside of the shader codegen.
* i965: Add support for pull constants to the new FS backend.Eric Anholt2010-10-221-3/+3
| | | | Fixes glsl-fs-uniform-array-5, but not 6 which fails in ir_to_mesa.
* Drop GLcontext typedef and use struct gl_context insteadKristian Høgsberg2010-10-131-3/+3
|
* Rename GLvisual and __GLcontextModes to struct gl_configKristian Høgsberg2010-10-131-1/+1
|
* i965: Normalize cubemap coordinates like is done in the Mesa IR path.Eric Anholt2010-10-071-0/+2
| | | | Fixes glsl-fs-texturecube-2-*
* i965: When encountering an unknown opcode in new FS backend, print its name.Eric Anholt2010-08-271-1/+7
|
* i965: Start building 965 FS backend.Eric Anholt2010-08-261-0/+10
|
* i965: Fix up WM push constant setup on gen6.Eric Anholt2010-08-221-1/+7
| | | | Fixes glsl-algebraic-add-add-1.
* i965: Stream out CC unit state.Eric Anholt2010-06-121-0/+1
| | | | | | | | before: [ # ] backend test min(s) median(s) stddev. count [ 0] gl firefox-talos-gfx 31.791 32.287 1.11% 6/6 after: [ 0] gl firefox-talos-gfx 31.198 31.675 0.96% 6/6
* i965: Remove caching of surface state objects.Eric Anholt2010-06-111-4/+3
| | | | | | | | | | | It turns out that computing a 56 byte key to look up a 20-byte object out of a hash table was some sort of a bad idea. Whoops. before: [ # ] backend test min(s) median(s) stddev. count [ 0] gl firefox-talos-gfx 37.799 38.203 0.39% 6/6 after: [ 0] gl firefox-talos-gfx 34.761 34.784 0.17% 5/6
* i965: Convert the binding table to streamed indirect state.Eric Anholt2010-06-111-3/+4
| | | | | | | | | | | | | | | | | | | | | | | This slightly reduces reduces cairo-gl firefox-talos-gfx runtime on my Ironlake: before: [ # ] backend test min(s) median(s) stddev. count [ 0] gl firefox-talos-gfx 38.236 38.383 0.43% 5/6 after: [ 0] gl firefox-talos-gfx 37.799 38.203 0.39% 6/6 It turns out the cost of caching these objects and looking them up in the cache again is greater than the cost of just computing the object again, particularly when the overhead of having a separate BO to pin is removed. (Those that are paying close attention will note that this is a reversal of the path I was moving the driver in a couple of years ago. The major thing that has changed is that back then all state was recomputed when we wrapped the streaming state buffer, including recompiling our precious programs. Now, we're uncaching just the objects that are cheap to compute, and retaining caching of expensive objects)
* i965: Split constant buffer setup from its surface state/binding state.Eric Anholt2010-06-111-2/+4
| | | | This was bothering me when redoing the binding tables.
* i965: Set the CC VP state immediately on state change.Eric Anholt2010-06-111-0/+7
| | | | | | | | The cache lookup of these two little floats was .12% of total CPU time on firefox-talos-gfx because we did it any time commonly-changed state changed. On the other hand, updating the CC VP bo immediately whenver CC VP state changes is a .07% overhead due to putting a driver hoook in glEnable().
* i965: Avoid calloc/free in the CURBE upload process.Eric Anholt2010-06-091-0/+11
| | | | | | | In exchange we end up with an extra memcpy, but that seems better than calloc/free. Each buffer is 4k maximum, and on the i965-streaming branch this allocation was showing up as the top entry in brw_validate_state profiling for cairo-gl.
* intel: Change dri_bo_* to drm_intel_bo* to consistently use new API.Eric Anholt2010-06-081-39/+40
| | | | | The slightly less mechanical change of converting the emit_reloc calls will follow.
* i965: Add support for all 8 possible ARB_draw_buffers in Mesa.Eric Anholt2010-05-231-1/+1
| | | | | We should be able to do 16, but are limited by Mesa's static buffer allocations.
* i965: Remove the half-baked code for multiple OQs at the same time.Eric Anholt2010-05-161-4/+1
| | | | | GL doesn't actually let you begin an OQ while one is active, so the extra work was pointless.
* i965: Remove unused occlusion query struct field.Eric Anholt2010-05-161-3/+0
|
* i965: Dump out the correct shared function for SEND on Ironlake.Eric Anholt2010-05-141-1/+1
|
* dri: Add DRI entrypoints to create a context for a given APIKristian Høgsberg2010-04-281-1/+2
|
* i965: Use the PLN instruction when possible in interpolation.Eric Anholt2010-03-101-0/+1
| | | | | | Saves an instruction in PINTERP, LINTERP, and PIXEL_W from brw_wm_glsl.c For non-GLSL it isn't used yet because the deltas have to be laid out differently.
* i965: Add Sandybridge scissor state.Eric Anholt2010-02-251-1/+1
|
* i965: Set up the SNB URB.Eric Anholt2010-02-251-1/+2
| | | | even with vs disabled, still doesn't work.
* i965: Start adding support for the Sandybridge CC unit.Eric Anholt2010-02-251-1/+14
|
* i965: Keep the CURBE BO mapped and memcpy instead of subdataing.Eric Anholt2010-02-061-5/+0
| | | | | | For the tiny bis of data we generally upload through the CURBEs, the overhead of the kernel's pagetable trickery is actually rather high. This improves cairo-gl gnome-terminal-vim performance by 3.8%.
* intel: Remove dead note_fence vtbl hook.Eric Anholt2010-01-191-1/+0
|
* i965: Upload as many VS constants as possible through the push constants.Eric Anholt2010-01-191-0/+1
| | | | | | | The pull constants require sending out to an overworked shared unit and waiting for a response, while push constants are nicely loaded in for us at thread dispatch time. By putting things we access in every VS invocation there, ETQW performance improved by 2.5% +/- 1.6% (n=6).
* i965: Allow for variable-sized auxdata in the state cache.Eric Anholt2010-01-191-1/+0
| | | | | | Everything has been constant-sized until now, but constant buffer handling changes will make us want some additional variable sized array.
* Remove leftover __DRI{screen,drawable,context}Private referencesKristian Høgsberg2010-01-041-1/+1
| | | | | | | | | As part of the DRI driver interface rewrite I merged __DRIscreenPrivate and __DRIscreen, and likewise for __DRIdrawablePrivate and __DRIcontextPrivate. I left typedefs in place though, to avoid renaming all the *Private use internal to the driver. That was probably a mistake, and it turns out a one-line find+sed combo can do the mass rename. Better late than never.
* intel: Replace IS_965 checks with context structure usage.Eric Anholt2009-12-221-0/+2
| | | | Saves another 600 bytes or so of code.
* intel: Replace IS_G4X() across the driver with context structure usage.Eric Anholt2009-12-221-1/+10
| | | | Saves ~2KB of code.
* intel: Consistently use no_batch_wrap in intel_context struct.Eric Anholt2009-11-191-1/+0
|
* i965: Pack brw_wm_fragment_program better.Eric Anholt2009-11-191-1/+1
|
* Merge branch 'outputswritten64'Ian Romanick2009-11-171-5/+1
| | | | | | | | | | | | | | | | | | | | | | | | Add a GLbitfield64 type and several macros to operate on 64-bit fields. The OutputsWritten field of gl_program is changed to use that type. This results in a fair amount of fallout in drivers that use programs. No changes are strictly necessary at this point as all bits used are below the 32-bit boundary. Fairly soon several bits will be added for clip distances written by a vertex shader. This will cause several bits used for varyings to be pushed above the 32-bit boundary. This will affect any drivers that support GLSL. At this point, only the i965 driver has been modified to support this eventuality. I did this as a "squash" merge. There were several places through the outputswritten64 branch where things were broken. I foresee this causing difficulties later for bisecting. The history is still available in the branch. Conflicts: src/mesa/drivers/dri/i965/brw_wm.h
* i965: Remove an unused cache_item field.Eric Anholt2009-11-131-1/+0
|
* i965: Remove long dead structures for ffvertex_prog.c.Eric Anholt2009-11-131-17/+0
|
* i965: Always pass the size argument to brw_cache_data.Eric Anholt2009-11-061-1/+0
| | | | | This keeps the individual state files from having to export their structures for brw_state_cache initialization.