Skip to content

[EXPERIMENT] Measure the perf impact of PackedFingerprint#152695

Closed
Zalathar wants to merge 1 commit intorust-lang:mainfrom
Zalathar:unaligned
Closed

[EXPERIMENT] Measure the perf impact of PackedFingerprint#152695
Zalathar wants to merge 1 commit intorust-lang:mainfrom
Zalathar:unaligned

Conversation

@Zalathar
Copy link
Member

I noticed that the PackedFingerprint optimization in #78646 is only applied on x86-family hosts, and not on other host architectures, even if they can be expected to support unaligned memory access efficiently (e.g. modern aarch64).

Before I propose expanding that optimization to some or all other architectures, let's measure the perf and memory impact that it currently has on x86-64.

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 16, 2026
@Zalathar
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rust-bors

This comment has been minimized.

rust-bors bot pushed a commit that referenced this pull request Feb 16, 2026
[EXPERIMENT] Measure the perf impact of `PackedFingerprint`
@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 16, 2026
@rust-bors
Copy link
Contributor

rust-bors bot commented Feb 16, 2026

☀️ Try build successful (CI)
Build commit: 7d1c6cc (7d1c6cc07ebd305aa8f6b75d8e18744dd373109b, parent: 139651428df86cf88443295542c12ea617cbb587)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (7d1c6cc): comparison URL.

Overall result: ❌ regressions - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
0.3% [0.3%, 0.3%] 3
Regressions ❌
(secondary)
0.3% [0.1%, 0.5%] 22
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.3% [0.3%, 0.3%] 3

Max RSS (memory usage)

Results (primary 3.7%, secondary 4.4%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
3.7% [0.7%, 6.7%] 116
Regressions ❌
(secondary)
4.4% [1.5%, 6.5%] 35
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 3.7% [0.7%, 6.7%] 116

Cycles

Results (primary -2.0%, secondary 2.8%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.8% [2.2%, 3.4%] 5
Improvements ✅
(primary)
-2.0% [-2.0%, -2.0%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -2.0% [-2.0%, -2.0%] 1

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 482.657s -> 482.803s (0.03%)
Artifact size: 397.96 MiB -> 395.93 MiB (-0.51%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Feb 16, 2026
@Zalathar Zalathar closed this Feb 16, 2026
@rustbot rustbot removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Feb 16, 2026
@Zalathar Zalathar deleted the unaligned branch February 16, 2026 10:59
JonathanBrouwer added a commit to JonathanBrouwer/rust that referenced this pull request Mar 24, 2026
…ease

Unalign `PackedFingerprint` on all hosts, not just x86 and x86-64

Back in rust-lang#78646, `DepNode` was modified to store an unaligned `PackedFingerprint` instead of an 8-byte-aligned `Fingerprint`. That reduced the size of DepNode from 24 bytes to 17 bytes (nowadays 18 bytes), resulting in considerable memory savings in incremental builds.

See rust-lang#152695 (comment) for a benchmark demonstrating the impact of *removing* that optimization.

At the time (and today), the unaligning was only performed on x86 and x86-64 hosts, because those CPUs are known to generally have low overhead for unaligned memory accesses. Hosts with other CPU architectures would continue to use an 8-byte-aligned fingerprint and a 24-byte DepNode.

Given the subsequent rise of aarch64 (especially on macOS) and other architectures, it's a shame that some commonly-used builds of rustc don't get those memory-size benefits, based on a decision made several years ago under different circumstances.

We don't have benchmarks to show the actual effect of unaligning DepNode fingerprints on various non-x86 hosts, but it seems very likely to be a good idea on Apple chips, and I have no particular reason to think that it will be catastrophically bad on other hosts. And we don't typically perform this kind of speculative pessimization in other parts of the compiler.
github-actions bot pushed a commit to rust-lang/miri that referenced this pull request Mar 25, 2026
Unalign `PackedFingerprint` on all hosts, not just x86 and x86-64

Back in rust-lang/rust#78646, `DepNode` was modified to store an unaligned `PackedFingerprint` instead of an 8-byte-aligned `Fingerprint`. That reduced the size of DepNode from 24 bytes to 17 bytes (nowadays 18 bytes), resulting in considerable memory savings in incremental builds.

See rust-lang/rust#152695 (comment) for a benchmark demonstrating the impact of *removing* that optimization.

At the time (and today), the unaligning was only performed on x86 and x86-64 hosts, because those CPUs are known to generally have low overhead for unaligned memory accesses. Hosts with other CPU architectures would continue to use an 8-byte-aligned fingerprint and a 24-byte DepNode.

Given the subsequent rise of aarch64 (especially on macOS) and other architectures, it's a shame that some commonly-used builds of rustc don't get those memory-size benefits, based on a decision made several years ago under different circumstances.

We don't have benchmarks to show the actual effect of unaligning DepNode fingerprints on various non-x86 hosts, but it seems very likely to be a good idea on Apple chips, and I have no particular reason to think that it will be catastrophically bad on other hosts. And we don't typically perform this kind of speculative pessimization in other parts of the compiler.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

perf-regression Performance regression. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants