[EXPERIMENT] Measure the perf impact of `PackedFingerprint` by Zalathar · Pull Request #152695 · rust-lang/rust

Zalathar · 2026-02-16T05:33:11Z

I noticed that the PackedFingerprint optimization in #78646 is only applied on x86-family hosts, and not on other host architectures, even if they can be expected to support unaligned memory access efficiently (e.g. modern aarch64).

Before I propose expanding that optimization to some or all other architectures, let's measure the perf and memory impact that it currently has on x86-64.

Zalathar · 2026-02-16T05:33:22Z

@bors try @rust-timer queue

[EXPERIMENT] Measure the perf impact of `PackedFingerprint`

rust-bors · 2026-02-16T07:47:31Z

☀️ Try build successful (CI)
Build commit: 7d1c6cc (7d1c6cc07ebd305aa8f6b75d8e18744dd373109b, parent: 139651428df86cf88443295542c12ea617cbb587)

rust-timer · 2026-02-16T08:27:44Z

Finished benchmarking commit (7d1c6cc): comparison URL.

Overall result: ❌ regressions - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

	mean	range	count
Regressions ❌ (primary)	0.3%	[0.3%, 0.3%]	3
Regressions ❌ (secondary)	0.3%	[0.1%, 0.5%]	22
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.3%	[0.3%, 0.3%]	3

Max RSS (memory usage)

Results (primary 3.7%, secondary 4.4%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	3.7%	[0.7%, 6.7%]	116
Regressions ❌ (secondary)	4.4%	[1.5%, 6.5%]	35
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	3.7%	[0.7%, 6.7%]	116

Cycles

Results (primary -2.0%, secondary 2.8%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	2.8%	[2.2%, 3.4%]	5
Improvements ✅ (primary)	-2.0%	[-2.0%, -2.0%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-2.0%	[-2.0%, -2.0%]	1

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 482.657s -> 482.803s (0.03%)
Artifact size: 397.96 MiB -> 395.93 MiB (-0.51%)

…ease Unalign `PackedFingerprint` on all hosts, not just x86 and x86-64 Back in rust-lang#78646, `DepNode` was modified to store an unaligned `PackedFingerprint` instead of an 8-byte-aligned `Fingerprint`. That reduced the size of DepNode from 24 bytes to 17 bytes (nowadays 18 bytes), resulting in considerable memory savings in incremental builds. See rust-lang#152695 (comment) for a benchmark demonstrating the impact of *removing* that optimization. At the time (and today), the unaligning was only performed on x86 and x86-64 hosts, because those CPUs are known to generally have low overhead for unaligned memory accesses. Hosts with other CPU architectures would continue to use an 8-byte-aligned fingerprint and a 24-byte DepNode. Given the subsequent rise of aarch64 (especially on macOS) and other architectures, it's a shame that some commonly-used builds of rustc don't get those memory-size benefits, based on a decision made several years ago under different circumstances. We don't have benchmarks to show the actual effect of unaligning DepNode fingerprints on various non-x86 hosts, but it seems very likely to be a good idea on Apple chips, and I have no particular reason to think that it will be catastrophically bad on other hosts. And we don't typically perform this kind of speculative pessimization in other parts of the compiler.

Unalign `PackedFingerprint` on all hosts, not just x86 and x86-64 Back in rust-lang/rust#78646, `DepNode` was modified to store an unaligned `PackedFingerprint` instead of an 8-byte-aligned `Fingerprint`. That reduced the size of DepNode from 24 bytes to 17 bytes (nowadays 18 bytes), resulting in considerable memory savings in incremental builds. See rust-lang/rust#152695 (comment) for a benchmark demonstrating the impact of *removing* that optimization. At the time (and today), the unaligning was only performed on x86 and x86-64 hosts, because those CPUs are known to generally have low overhead for unaligned memory accesses. Hosts with other CPU architectures would continue to use an 8-byte-aligned fingerprint and a 24-byte DepNode. Given the subsequent rise of aarch64 (especially on macOS) and other architectures, it's a shame that some commonly-used builds of rustc don't get those memory-size benefits, based on a decision made several years ago under different circumstances. We don't have benchmarks to show the actual effect of unaligning DepNode fingerprints on various non-x86 hosts, but it seems very likely to be a good idea on Apple chips, and I have no particular reason to think that it will be catastrophically bad on other hosts. And we don't typically perform this kind of speculative pessimization in other parts of the compiler.

(DO NOT MERGE) Measure the perf impact of PackedFingerprint

43c958d

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 16, 2026

This comment has been minimized.

Sign in to view

rust-bors bot pushed a commit that referenced this pull request Feb 16, 2026

Auto merge of #152695 - Zalathar:unaligned, r=<try>

7d1c6cc

[EXPERIMENT] Measure the perf impact of `PackedFingerprint`

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 16, 2026

This comment has been minimized.

Sign in to view

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Feb 16, 2026

Zalathar closed this Feb 16, 2026

rustbot removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Feb 16, 2026

Zalathar deleted the unaligned branch February 16, 2026 10:59

Zalathar mentioned this pull request Feb 16, 2026

Unalign PackedFingerprint on all hosts, not just x86 and x86-64 #152710

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[EXPERIMENT] Measure the perf impact of `PackedFingerprint`#152695

[EXPERIMENT] Measure the perf impact of `PackedFingerprint`#152695
Zalathar wants to merge 1 commit intorust-lang:mainfrom
Zalathar:unaligned

Zalathar commented Feb 16, 2026

Uh oh!

Zalathar commented Feb 16, 2026

Uh oh!

This comment has been minimized.

This comment has been minimized.

rust-bors bot commented Feb 16, 2026

Uh oh!

This comment has been minimized.

rust-timer commented Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Zalathar commented Feb 16, 2026

Uh oh!

Zalathar commented Feb 16, 2026

Uh oh!

This comment has been minimized.

This comment has been minimized.

rust-bors bot commented Feb 16, 2026

Uh oh!

This comment has been minimized.

rust-timer commented Feb 16, 2026

Overall result: ❌ regressions - please read the text below

Instruction count

Max RSS (memory usage)

Cycles

Binary size

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants