| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Patch moves intel_tiled_memcpy[_sse41] libraries to isl, renames some
functions and types and makes the required build system changes for
meson, automake and Android. No functional changes are introduced.
v2: code cleanups, move isl_get_memcpy_type to i965 (Jason)
v3: move isl_mem_copy_fn to priv header, cleanups (Jason, Dylan)
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are two problems with the fixed patch. First, it fails to create a
dependency on the sourced .c file, so changes to intel_tiled_memcpy.c
won't trigger a rebuild. It also doesn't get included in the dist
tarball.
Fixes: 11b1afdc92db98e93f2ca50beeb7fc481a11e708
("i965/tiled_memcpy: inline movntdqa loads in tiled_to_linear")
Reviewed-by: Tapani Pälli <[email protected]>
Reviewed-by: Juan A. Suarez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The reference for MOVNTDQA says:
For WC memory type, the nontemporal hint may be implemented by
loading a temporary internal buffer with the equivalent of an
aligned cache line without filling this data to the cache.
[...] Subsequent MOVNTDQA reads to unread portions of the WC
cache line will receive data from the temporary internal
buffer if data is available.
This hidden cache line sized temporary buffer can improve the
read performance from wc maps.
v2: Add mfence at start of tiled_to_linear for streaming loads (Chris)
v3: add Android build support (Tapani)
v4: squash 'fix i915: Fix streaming loads for intel_tiled_memcpy'
separate sse41 to own static library (Tapani)
Reviewed-by: Chris Wilson <[email protected]> (v2)
Reviewed-by: Matt Turner <[email protected]> (v2)
Acked-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This reverts commit 79fe00efb474b3f3f0ba4c88826ff67c53a02aef.
This reverts commit f5e8b13f78a085bc95a1c0895e4a38ff6b87b375.
This reverts commit d21c086d819d78fb3f6abcbb14aa492970f442aa.
They broke the Android build and I'd rather not leave it broken
for the long holiday weekend.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The reference for MOVNTDQA says:
For WC memory type, the nontemporal hint may be implemented by
loading a temporary internal buffer with the equivalent of an
aligned cache line without filling this data to the cache.
[...] Subsequent MOVNTDQA reads to unread portions of the WC
cache line will receive data from the temporary internal
buffer if data is available.
This hidden cache line sized temporary buffer can improve the
read performance from wc maps.
v2: Add mfence at start of tiled_to_linear for streaming loads (Chris)
Reviewed-by: Chris Wilson <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We leave gen4-5 alone because the ISL code hasn't really been well-
tested on gen4-5 or with combined depth-stencil because we don't use
BLORP for depth operations on gen4-5. Also, the gen4-5 code has to deal
with intratile offsets for LOD hacks and ISL doesn't handle those yet.
We could make ISL handle gen4-5 capable or we could just not bother.
Among other things, this should make future platform enabling easier
because it means we don't have to update multiple (or hand-rolled!)
depth stencil emit paths.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The INTEL_performance_query extension provides a list of queries that
a user can select to monitor a particular workload. Each query reports
different sets of counters (roughly looking at different parts of the
hardware, i.e. caches/fixed functions/etc...).
Each query has an associated configuration that we need to program
into the hardware before using the query. Up to now, we provided
predefined queries. This change allows the user to build its own query
(and associated configuration) externally, and have the i965 driver
use that configuration through a new query named :
Intel_Raw_Hardware_Counters_Set_0_Query
When this query is selected, the i965 driver will report raw counters
deltas (meaning their values need to be interpreted by the user, as
opposed to existing queries that provide human readable values).
This change is also useful for debug purposes for building new
pre-defined queries and verifying the underlying numbers make sense
before writing equations for user readable output.
This change's purpose is also to enable GPA. GPA uses a library called
MDAPI that processes raw counter data. MDAPI expects raw data to have
a certain layout (per generation which is a bit unfortunate...). This
change also embeds the expected data layouts.
v2: Enable raw queries on gen 7->11, v1 had 7->9 (Lionel)
v3: Don't assert on cherryview for gen7... (Ken)
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We would like to reuse a number of the functions and structures in
another file in a future commit.
We also move the previous content of brw_performance_query.h into
brw_performance_query_metrics.h to be included by generated metrics
files.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
v2: Add brw_oa_cnl.xml to EXTRA_DIST (Emil)
Signed-off-by: Lionel Landwerlin <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Performance metric numbers are calculated the following way :
- out of the 256 bytes long OA reports, we accumulate the deltas
into an array of uint64_t
- the equations' generated code reads the accumulated uint64_t
deltas and normalizes them for a particular platform
Our hardware is such that a number of counters in the OA reports
always return the same values (i.e. they're not programmable), and
they return the same values even across generations, and as a result a
number of equations are identical in different metric sets across
different generations.
Up to now we've kept the generated code of the equations separated in
different files (per generation/GT), and didn't apply any
factorization of the common equations. We could have make some
improvement by reusing equations within a given metrics file, but we
can go even further and reuse across generations (i.e. all files).
This change changes the code generation to emit a single file in which
we reuse equations emitted code based on the hash of equations'
strings.
Here are the savings in a meson build :
Before(.old)/after :
$ du -h ./build/src/mesa/drivers/dri/libmesa_dri_drivers.so ./build/src/mesa/drivers/dri/libmesa_dri_drivers.so.old
43M ./build/src/mesa/drivers/dri/libmesa_dri_drivers.so
47M ./build/src/mesa/drivers/dri/libmesa_dri_drivers.so.old
$ size build/src/mesa/drivers/dri/libmesa_dri_drivers.so build/src/mesa/drivers/dri/libmesa_dri_drivers.so.old
text data bss dec hex filename
13054002 409424 671856 14135282 d7aff2 build/src/mesa/drivers/dri/libmesa_dri_drivers.so
14550386 409552 671856 15631794 ee85b2 build/src/mesa/drivers/dri/libmesa_dri_drivers.so.old
As a side comment here is the size of the drivers if we remove all of
the metrics from the build :
$ du -sh build/src/mesa/drivers/dri/libmesa_dri_drivers.so
40M build/src/mesa/drivers/dri/libmesa_dri_drivers.so
v2: Fix an issue with hashing of counter equations (Lionel)
Build system rework (Emil)
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Emil Velikov <[email protected]> (build system part)
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Meta is awful and we'd like to stop using it. Implementing this using
BLORP allows us to stop trashing a bunch of GL state every time.
This follows the structure of st_generate_mipmap().
compute_num_levels is lifted directly from there.
Improves performance in Gl41HdrBloom by about 11.794% +/- 1.01919% (n=3)
on Kabylake GT2 at 1280x720 (the difference seems much smaller at higher
resolutions).
v2 (idr): Don't try depth or depth-stencil blorp blits on Gen4 or Gen5
because it's not implemented yet.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This resolves an apparent game bug described in 85564. The game
doesn't properly handle ARB_get_program_binary with 0 supported
formats.
V2 (Timothy Arceri):
- less driver code as more has been moved into the common helpers.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85564
Signed-off-by: Timothy Arceri <[email protected]>
Signed-off-by: Jordan Justen <[email protected]> (v1)
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
| |
Fixes: bfe0f3a7027 ("i965: Move PIPE_CONTROL defines and prototypes to
brw_pipe_control.h.")
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This uses the Mesa disk_cache support to write out the final linked
binary for vertex and fragment shader programs.
This is based off the initial implementation done by Carl Worth. It
has been significantly reworked, first by Tim Arceri, and then by
Jordan Justen.
v2:
* Squash 'i965: add image param shader cache support'
* Squash 'i965: add shader cache support for pull param pointers'
* Sustantially simplified by a rework on top of Jason's 2975e4c56a7a.
* Rename load_program_data to read_program_data. (Jason)
v3:
* Simplify and align program read/write. (Jason)
v4:
* Don't save prog_data size since we know it from the stage. (Ken)
* Don't save program size, since prog_data includes the size. (Ken)
* Remove `assert` that potentially could be triggered by disk
corruption of the cache entries. (Ken)
* Fix compute shader scratch allocation. (Ken)
* Remove special case mapping for non-LLC. (Ken)
* Remove SET_UPLOAD_PARAMS macro
[[email protected]: *_cached_program => brw_disk_cache_*_program]
[[email protected]: brw_shader_cache.c => brw_disk_cache.c]
[[email protected]: don't map to write program when LLC is present]
[[email protected]: set program_written_to_cache on read from cache]
[[email protected]: only try cache when status is linking_skipped]
[[email protected]: all v2-v4 changes noted above]
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The only thing it was handling was push constants. We pull the actual
constant upload code into gen6_constant_state.c and the atoms into
genX_state_upload.c.
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
These two paths are basically the same. There's no good reason to have
them in different files.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The batch buffer and state buffer code is fairly tied together,
and having it in one .c file will make refactoring easier.
Also, drop some commentary above brw_state_batch. The "aperture
checking performance hacks" are long since gone, so that paragraph
makes little sense at this point.
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Chris Wilson <[email protected]>
|
|
|
|
|
|
|
| |
We handle the Sandybridge multisampled 2D surface hack here, rather
than in ISL, because it requires allocating a BO, and is kind of messy.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The code doesn't get exactly a lot simpler but at least it is in a single
place, and we delete more than we add.
Another good point is that you get rid of struct brw_wm_unit_state
which was a third mechanism for encoding GEN state. We used to have
GENXML, manual packing and these bitfield structs. Now we're down to
just GENXML and some manual packing. (Khristian)
Signed-off-by: Rafael Antognolli <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Add the code into its own function and atom, since almost nothing is
shared with GEN >= 6.
v2: Split GEN <=5 and GEN >= 6 into separate functions (Ken).
v3: Minor tidying by Ken.
Signed-off-by: Rafael Antognolli <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Merge the code with gen6+ 3DSTATE_GS, and delete brw_gs_state.c,
together with brw_gs_unit_state.
Signed-off-by: Rafael Antognolli <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This function only emits a particular case of 3DSTATE_GS. Instead, we can do
that inside genX(upload_gs_state), and later reuse part of that code for
emitting gen4-5 state.
There's the additional benefit of allowing us to remove gen6_gs_state.c, which
was only left because of this function.
Signed-off-by: Rafael Antognolli <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
It's a very simple conversion, and it allows us to delete brw_cc.c.
Signed-off-by: Rafael Antognolli <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Also updates Makefile.am to generate corresponding normalization code.
Signed-off-by: Robert Bragg <[email protected]>
Acked-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
The sampler state code was all moved to genxml, so we can get rid of these
functions and delete the file.
Signed-off-by: Rafael Antognolli <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
| |
This patch finishes the work done by Ken of converting SF_STATE to genxml, and
merges it with gen6+ code for emitting that state.
Signed-off-by: Rafael Antognolli <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
| |
We tend to use the sources, as opposed to EXTRA_DIST to include the
headers.
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Juan A. Suarez Romero <[email protected]>
|
|
|
|
|
|
|
|
| |
V2: Remove isl_gen10.c and isl_gen10.h
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
| |
Now that we've moved over to the new array mechanism, it's no longer
needed.
Reviewed-by: Topi Pohjolainen <[email protected]>
Acked-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Due to complications with things such as URB setup on gen4-5, it's
easier to keep gen4 support in blorp completely internal to i965. This
makes things a bit awkward because that means there's a file in i965
that includes blorp_priv.h but it's either that or have a file in blorp
that includes brw_context.h.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
| |
With this last state ported, we can get rid of gen8_draw_upload.c.
Signed-off-by: Rafael Antognolli <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
| |
It's actually not that much code.
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On this patch, we port:
- brw_polygon_stipple
- brw_polygon_stipple_offset
- brw_line_stipple
- brw_drawing_rect
v2:
- Also emit states for gen4-5 with this code.
v3:
- Style fixes and remove excessive checks (Ken).
Signed-off-by: Rafael Antognolli <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Rafael Antognolli <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The following states are ported on this patch:
- gen6_gs_push_constants
- gen6_vs_push_constants
- gen6_wm_push_constants
- gen7_tes_push_constants
v2:
- Use helper to setup brw_address (Kristian)
v3:
- Do not use macro for upload_constant_state (Ken)
- Do not re-declare MOCS macro (Ken)
v4: (by Ken)
- Drop more dead code, change brw->gen checks to GEN_GEN, style nits
Signed-off-by: Rafael Antognolli <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Emit 3DSTATE_SCISSOR_STATE_POINTERS using brw_batch_emit, and pack the
scissor states using GENX(SCISSOR_RECT_pack), generated from genxml.
v3:
- Remove old code (Ken)
- Style fixes (Ken)
Signed-off-by: Rafael Antognolli <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Emit 3DSTATE_TE on Gen7+ using brw_batch_emit helper.
Signed-off-by: Rafael Antognolli <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Upload blend states using GENX(BLEND_STATE_ENTRY_pack), generated from
genxml.
v3:
- style fixes (Ken)
- cleanup to remove excessive #ifdef's (Ken)
- remove memset (Ken)
- disable blend.AlphaToCoverageDitherEnable on gen6 (Ken)
Signed-off-by: Rafael Antognolli <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Ported in this patch:
- 3DSTATE_DS
- 3DSTATE_GS
- 3DSTATE_HS
- 3DSTATE_VIEWPORT_STATE_POINTERS_SF_CL
v3:
- Remove NEW_TRANSFORM blocks (Ken)
- Bring back some comments and workaround for Ivybridge (Ken)
Signed-off-by: Rafael Antognolli <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Emit 3DSTATE_VS on Gen6+ using brw_batch_emit helper, that uses pack
structs from genxml.
v2:
- Use render_bo helper to setup brw_address (Kristian)
v3:
- Bring back some comments for gen6 and remove _NEW_TRANSFORM blocks
from gen7+.
Signed-off-by: Rafael Antognolli <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Emit 3DSTATE_PS_EXTRA on Gen8+ using brw_batch_emit helper, that uses
pack structs from genxml.
v3:
- Style fixes and moving code around to be cleaner (Ken)
Signed-off-by: Rafael Antognolli <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Emit 3DSTATE_WM on Gen6+ using brw_batch_emit helper, that uses pack
structs from genxml.
v2:
- Use render_bo helper to setup brw_address (Kristian)
- Remove TODO and use BRW_PSCDEPTH_OFF.
v3:
- A couple of style fixes (Ken)
- Enable RASTRULE_UPPER_RIGHT on gen6+ instead of gen8+ (Ken)
Signed-off-by: Rafael Antognolli <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|