Conversation
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
| This relies on introducing a new encoding to represent exception patching, which would be a forward-compatibility break | ||
| as is always the case when adding a new default encoding. |
There was a problem hiding this comment.
as is always the case when adding a new default encoding
This is not the purpose of this RFC, but just calling out this is going to continue to be annoying, I see a few alternatives here
- All new encodings need to be gated behind a Writer flag so they are not written unless you explicitly opt-in. Then after some number of releases they can be enabled by default.
- Come back around to the idea of distributing encodings as WASM binaries, seems unlikely to be picked up very widely
- NEVER allow new encodings within a single "edition". We'd need to formalize what an edition means, how frequently we drop one, and how we maintain and test encodings on develop between edition releases.
|
|
||
| ## Summary | ||
|
|
||
| Make a backwards compatible change to the serialization format for `Patches` used by the FastLanes-derived encodings: |
There was a problem hiding this comment.
Why limit to fastlanes encodings what about sparse arrays?
There was a problem hiding this comment.
Mostly just because the whole "lanes" concept only maps cleanly to primitives.
I suppose this could help us write a data-parallel version of sparsearray though...
| pub(super) indices: BufferHandle, | ||
| /// patch values corresponding to the indices. The ptype is specified by `values_ptype`. | ||
| pub(super) values: BufferHandle, |
There was a problem hiding this comment.
Do we want these to be uncompressed and never compressed in the future?
There was a problem hiding this comment.
If we assume that patches are only 0.5-1% of the overall array then I think compression is sort of superfluous, yea.
There was a problem hiding this comment.
I disagree strongly here, you can always write decompressed arrays but will find it much harder to go the other way.
There was a problem hiding this comment.
that's valid, maybe i'm underestimating how likely this is to change in the future. let me update the PR to hold child values and see how that works out
| pub(super) offset: usize, | ||
| /// Total length. | ||
| pub(super) len: usize, |
There was a problem hiding this comment.
are these size bounds on this?
There was a problem hiding this comment.
not on len, but offset < 1024. I just used usize just for indexing convenience
| The PatchedArray holds buffer handles for the `lane_offsets` which provides chunk/lane-level random indexing | ||
| into the patch `indices` and `values`, so these values can live equivalently in device or host memory. | ||
|
|
||
| The only operation performed at planning time is slicing, which means that all of its reduce rules would run |
There was a problem hiding this comment.
what will you do here?
|
so here's an annoying thing: converting BitPackedArray that holds an The easiest way to solve this problem is to, on read, invert the BP w/patches into the new format. That way we can delete all of the code having to deal with interior Patches for BP. However, this forces you to do execution on the read path to load and tranpose patch values/indices. Doing execution in the read path goes against our existing model. Another alternative is that we copy-paste BitPacked and make a BitPackedV2 that does not have a patches child, and we write that one but continue to leave the existing BitPacked codepaths in place. The downside is that we have to maintain both forever. |
|
You could also impl |
|
Oh interesting. That might work. I just want to be sure that we stop producing arrays with interior patches after we merge all of this |
Distilling some thoughts from the initial implementation work into RFC so we can all get on the same page before we go any further.