[EXPERIMENT] Measure the perf impact of PackedFingerprint#152695
[EXPERIMENT] Measure the perf impact of PackedFingerprint#152695Zalathar wants to merge 1 commit intorust-lang:mainfrom
PackedFingerprint#152695Conversation
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
[EXPERIMENT] Measure the perf impact of `PackedFingerprint`
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (7d1c6cc): comparison URL. Overall result: ❌ regressions - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary 3.7%, secondary 4.4%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary -2.0%, secondary 2.8%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 482.657s -> 482.803s (0.03%) |
…ease Unalign `PackedFingerprint` on all hosts, not just x86 and x86-64 Back in rust-lang#78646, `DepNode` was modified to store an unaligned `PackedFingerprint` instead of an 8-byte-aligned `Fingerprint`. That reduced the size of DepNode from 24 bytes to 17 bytes (nowadays 18 bytes), resulting in considerable memory savings in incremental builds. See rust-lang#152695 (comment) for a benchmark demonstrating the impact of *removing* that optimization. At the time (and today), the unaligning was only performed on x86 and x86-64 hosts, because those CPUs are known to generally have low overhead for unaligned memory accesses. Hosts with other CPU architectures would continue to use an 8-byte-aligned fingerprint and a 24-byte DepNode. Given the subsequent rise of aarch64 (especially on macOS) and other architectures, it's a shame that some commonly-used builds of rustc don't get those memory-size benefits, based on a decision made several years ago under different circumstances. We don't have benchmarks to show the actual effect of unaligning DepNode fingerprints on various non-x86 hosts, but it seems very likely to be a good idea on Apple chips, and I have no particular reason to think that it will be catastrophically bad on other hosts. And we don't typically perform this kind of speculative pessimization in other parts of the compiler.
Unalign `PackedFingerprint` on all hosts, not just x86 and x86-64 Back in rust-lang/rust#78646, `DepNode` was modified to store an unaligned `PackedFingerprint` instead of an 8-byte-aligned `Fingerprint`. That reduced the size of DepNode from 24 bytes to 17 bytes (nowadays 18 bytes), resulting in considerable memory savings in incremental builds. See rust-lang/rust#152695 (comment) for a benchmark demonstrating the impact of *removing* that optimization. At the time (and today), the unaligning was only performed on x86 and x86-64 hosts, because those CPUs are known to generally have low overhead for unaligned memory accesses. Hosts with other CPU architectures would continue to use an 8-byte-aligned fingerprint and a 24-byte DepNode. Given the subsequent rise of aarch64 (especially on macOS) and other architectures, it's a shame that some commonly-used builds of rustc don't get those memory-size benefits, based on a decision made several years ago under different circumstances. We don't have benchmarks to show the actual effect of unaligning DepNode fingerprints on various non-x86 hosts, but it seems very likely to be a good idea on Apple chips, and I have no particular reason to think that it will be catastrophically bad on other hosts. And we don't typically perform this kind of speculative pessimization in other parts of the compiler.
I noticed that the
PackedFingerprintoptimization in #78646 is only applied on x86-family hosts, and not on other host architectures, even if they can be expected to support unaligned memory access efficiently (e.g. modernaarch64).Before I propose expanding that optimization to some or all other architectures, let's measure the perf and memory impact that it currently has on x86-64.