Investigating how well intervening on Sparse Autoencoder internals prevents adversaries from accessing dangerous knowledge.
This is the code behind the paper Don't Forget It! Conditional Sparse Autoencoder Clamping Works for Unlearning by Matthew Khoriaty, Andrii Shportko, Gustavo Mercier, and Zach Wood-Doughty of Northwestern University.
Folder structure based on the one described in this website