Conversation
| } | ||
|
|
||
| func (n *Network) StartInstances() { | ||
| panic("Call Run() Instead") |
There was a problem hiding this comment.
This method isn't used. Why is it needed?
There was a problem hiding this comment.
because BasicNetwork is embedded in Network i don't want the caller accidentally calling StartInstances thinking its a valid method
|
Can we rebase on top of main? |
76e550a to
1e1fdc1
Compare
|
This is still somewhat non-deterministic to some degree: When I create a digest of the crashed and recovered node IDs in the schedule I get two possible outcomes: Patch is below: When we have When we have Can we make this fully deterministic? 🙏 It looks like we're very close to have it deterministic. |
|
|
||
| } | ||
|
|
||
| func (m *Mempool) moveTxsToUnaccepted(block *Block) { |
There was a problem hiding this comment.
Why do we need to do this? we only delete from unacceptedTxs in AcceptBlock so if a transaction is in unacceptedTxs in a child block or in a grandchild block, it won't be removed from it anyway.
| node.mempool.lock.Lock() | ||
| n.logger.Debug("Not all txs accepted yet by node", zap.Stringer("nodeID", node.E.ID), zap.Int("unaccepted txs in mempool", len(node.mempool.unacceptedTxs))) | ||
| node.mempool.lock.Unlock() | ||
| allAccepted = false |
There was a problem hiding this comment.
why can we break here? this is when allAccepted is false and we want to break when all are accepted
| return minHeightNodeID | ||
| } | ||
|
|
||
| func (n *Network) recoverToHeight(height uint64) { |
There was a problem hiding this comment.
The below way we are more deterministic (written by me, not thanks to CC):
func (n *Network) recoverToHeight(height uint64) {
// Decide which crashed nodes to recover and in what order.
// This keeps all randomness calls deterministic (fixed count per invocation),
// independent of the non-deterministic AdvanceTime loop below.
var nodesToRecover []int
for i, node := range n.nodes {
if node.isCrashed.Load() {
nodesToRecover = append(nodesToRecover, i)
}
}
n.randomness.Shuffle(len(nodesToRecover), func(i, j int) {
nodesToRecover[i], nodesToRecover[j] = nodesToRecover[j], nodesToRecover[i]
})
// Recover nodes one at a time in the chosen order, advancing time until
// all nodes have caught up before recovering the next one.
for _, idx := range nodesToRecover {
n.logger.Debug("Recovering node", zap.Stringer("nodeID", n.nodes[idx].E.ID))
n.startNode(idx)
for n.nodes[idx].storage.NumBlocks() < height {
n.logger.Debug("Advancing network time", zap.Uint64("num crashed nodes", n.numCrashedNodes()),
zap.Uint64("min height", n.getMinHeight()),
zap.Uint64("max height", n.getMaxHeight()),
zap.Stringer("Smallest node ID", n.getMinHeightNodeID()),
zap.Uint64("target height", height),
)
for _, node := range n.nodes {
n.lock.Lock()
node.AdvanceTime(n.config.AdvanceTimeTickAmount)
n.lock.Unlock()
}
}
}
// Advance time until all remaining nodes reach the target height.
for n.getMinHeight() < height {
n.logger.Debug("Advancing network time", zap.Uint64("num crashed nodes", n.numCrashedNodes()),
zap.Uint64("min height", n.getMinHeight()),
zap.Uint64("max height", n.getMaxHeight()),
zap.Stringer("Smallest node ID", n.getMinHeightNodeID()),
zap.Uint64("target height", height),
)
n.lock.Lock()
n.BasicInMemoryNetwork.AdvanceTime(n.config.AdvanceTimeTickAmount)
n.lock.Unlock()
}
}
There was a problem hiding this comment.
for _, node := range n.nodes {
n.lock.Lock()
node.AdvanceTime(n.config.AdvanceTimeTickAmount)
n.lock.Unlock()
}
why not just advance the time using
n.BasicInMemoryNetwork.AdvanceTime(n.config.AdvanceTimeTickAmount)
also i think this change is good, can we add it as a follow up though?
|
will continue the review tomorrow |
| return &TX{ID: idArr, shouldFailVerification: aTX.ShouldFailVerification} | ||
| } | ||
|
|
||
| func CreateNewTX() *TX { |
There was a problem hiding this comment.
Does the randomness here have any influence on the outcome? It sure has influence on the block hashes.
Should we perhaps inject randomness from our PRG?
There was a problem hiding this comment.
eh, i think it's fine to not include our PRG here. I've found block hashes aren't that helpful to keep deterministic imo, at least when i'm debugging.
testutil/random_network/mempool.go
Outdated
| } | ||
|
|
||
| // go through any blocks that build off of this one and move their txs | ||
| func (m *Mempool) purgeChildren(block *Block) { |
There was a problem hiding this comment.
aren't we missing a call to m.moveTxsToUnaccepted(block) ?
Because what about the transactions of the sibling block itself?
There was a problem hiding this comment.
yep I think this commit fixes that issue
b2f99b1
| m.lock.Lock() | ||
| defer m.lock.Unlock() | ||
|
|
||
| // Ensure the block has not already been verified or accepted |
There was a problem hiding this comment.
I'm wondering if we should fail the test if we encounter this and the below... not just return an error. Thoughts?
There was a problem hiding this comment.
added error logs which should fail the test
|
Will make another pass tomorrow, I think that's enough comments for now. |
| return n.BasicNode.HandleMessage(&msgCopy, from) | ||
| } | ||
|
|
||
| // copyMessage creates a copy of the message and its relevant fields to avoid mutating shared state in the in-memory network |
There was a problem hiding this comment.
so basically we just need to assign our own mempool, is that it?
|
I think that's all comments I have for now. |
No description provided.