Skip to content

Conversation

@mgoldstein322
Copy link
Contributor

Adds Carcosa Element to SST Elements
Features

  1. Generic Fault Injection interface using PortModules
  2. Generic Fault interface that plugs into Fault Injectors
  3. Out-of-the-box support for stuck bits, corrupt memory regions, random bit flip, and random event drop faults
  4. Injectors can be built to perform any number of faults, and any fault can be built to modify MemEvents

Known Issues

  • Random bit flip sometimes causes a crash in the simulator when modifying payloads that contain certain errors--unsure if this is solvable without significant retooling in MemEvent or a possible complementary PortModule that listens for specific modifications in events and "fixes" them so that they don't blow up the simulation. Potentially solvable with more complex logic on the bit/byte choosing function.

Planned Updates

  1. Add Hali interface for dynamic fault switching and dynamic injector swapping
  2. Refine stuck bit injector to be easier to use

update Makefile.am to include impl file
fix serialization to stop compiler errors and warnings
build parameter reader for fault type
add parameter for choosing fault type
add parameter for stuckAtFault inputs
…tly skeletoned, but I will add the rest soon
…class--need to fix some compiler issues and skeletonize the other faults
…'s still no dynamic way to change what fault logic is in use
compiler is giving me errors that the exact same code didn't give in the last fault--will test when this is resolved
randomDrop prototype complete--untested
need to refine other aspects (such as payload printing)
add debug output to event dropper
need to compare event drop logic to scott's portmodule example (just to make sure I'm not breaking anything on accident)
need to do 2 things:
1. change fault** into vec[fault*] and rewrite affected code
2. figure out better way to pass valid installation directions to injectors
must implement serialization for FaultBase class
after this is done, I can finally go in and fix the faulty logic in the StuckAtFault and CorruptMemRegion classes
Still todo:
1. Refine stuckAtFault debug prints to make more sense
2. Refine address selection in stuckAtFault and corruptMemFault to only corrupt the valid range of data
stuckAtFault debug output should be fixed, and it now properly accounts for endianness in the data array
FaultBase debug outputs for setMemEventPayload improved
need to take a look at corrupt mem next, and run some more tests on stuck at
…with other members

begin fix for corruptMemRegion
- No longer crashes simulation
- Should be more optimized to only run on smaller regions of code that actually need changes instead of looping through everything every time
- Currently crashes when hitting assertion for start index of corrupt message when two regions are in the same message but don't overlap
Todo:
- Build test configs for each included fault and get them into the testing system
rng hooks in FaultInjectorBase expanded to include every usage without later faults or injectors needing their own
need to figure out how to delay injection so that it doesn't occur when sst initialization is still occurring
@sst-autotester
Copy link
Contributor

Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging
NO INSPECTION HAS BEEN PERFORMED ON THIS PULL REQUEST! - This PR must be inspected by setting label 'AT: PRE-TEST INSPECTED'.

@hughes-c hughes-c added this to the SST v16.0.0 milestone Nov 2, 2025
@hughes-c hughes-c requested a review from feldergast November 2, 2025 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants