Update STREAM/href text and examples#82
Conversation
Replace outdated example URI schemes "httpg" and "ftp" in section 5.7 with "http" and "https". Also adjust text about what URI schemes parsers are expected to be able to read from.
|
This is a fix for issue #81. |
msdemlei
left a comment
There was a problem hiding this comment.
Hm. "absolute and relative" makes me nervous because we'll suddenly have to explain how to obtain a base URI for a VOTable. This may already be tricky when, e.g. in async TAP, the same VOTable is available under two different paths, but it becomes close to impossible when VOTables are stored or sent via SAMP.
Does TOPCAT actually do relative paths? And if so, this isn't a constant source of annoyance?
So... as long as we don't have the guts to actually drop external streams (I wonder how many parsers actually implement them, frankly), can we at least forbid relative URIs while we can claim it's just a clarification of the existing standard?
|
Yes STIL has always managed If you're going to use this STREAM business I'd say use of relative URLs makes it more rather than less usable, since you can just dump the data and metadata in the same place or nearby to each other and move them around together rather than pretty much guaranteeing that the content will break if you ever move the data file. So I kind of assumed that the obvious way to use STREAMs was like this and implemented it in STIL, though I don't think it says that explicitly anywhere. But I agree adding "relative" in the text here does introduce the explicit expectation of a behaviour that wasn't mentioned before, which could cause problems. And nobody's complained about the lack of this language up till now. The intention of this PR was just to tidy things up not to introduce new requirements, so I'm happy to remove the text "absolute and relative" if you think that's a good idea. |
|
On Thu, Mar 05, 2026 at 02:34:18AM -0800, Mark Taylor wrote:
mbtaylor left a comment (ivoa-std/VOTable#82)
If you're going to use this STREAM business I'd say use of relative
...which, full disclosure, I'd rather not.
URLs makes it more rather than less usable, since you can just dump
the data and metadata in the same place or nearby to each other and
move them around together rather than pretty much guaranteeing that
the content will break if you ever move the data file. So I kind
Ok, so what you're saying is that the whole href business really is a
way to get around the need to base64-encode stuff? Hmfine... if only
we had some standard way to reliably move the two parts as a unit
(within a zip file, say)... Ah well.
But I agree adding "relative" in the text here does introduce the
explicit expectation of a behaviour that wasn't mentioned before,
which could cause problems. And nobody's complained about the lack
of this language up till now. The intention of this PR was just to
tidy things up not to introduce new requirements, so I'm happy to
remove the text "absolute and relative" if you think that's a good
idea.
I don't think any of href as specified right now is much of a good
idea, so I wouldn't make that a criterion. As a data point, I note
that astropy with a plain href="stream.b64" says:
The vo package only supports remote data through {vo_prot}
(it actually doesn't at the moment, but that's another bug, and if it
weren't there, vo_prot would expand to ("http", "https", "ftp",
"file")).
With href="file://stream.b64" (which was my first attempt at
"relative"), it says:
<urlopen error [Errno 2] No such file or directory: ''>
Indeed, it will work with href="file:stream.b64". I can't be
bothered to make sure that's not what the file URI schema wants, but
it certainly looks funny.
With href="file:///home/msdemlei/stream.b64", it does (roughly) the
right thing (it complains about incorrect padding, but I'm willing to
believe this is because of my free-handed cut-and-paste; TOPCAT is
fine with the file, though, and so is b64decode).
If I base64-decode myself and just write
<STREAM href="file:///home/msdemlei/stream.bin">
it works, which *almost* convinces me this could be a good idea.
Incidentally, both astropy and TOPCAT do the right thing if I gzip
stream.bin and write encoding="gzip". Kudos!
Anyway: I still think href is a rabbit hole. But I won't stand in
your way with absolute and relative if you file a bug against
astropy.io.votable to the effect that relative URIs are not
supported. If someone over there promises to make them work (and I
may volunteer myself), I'm not against the relative part. Otherwise,
we could say "At this point, relative URIs are only supported by a
subset of the clients" or something like that. Yes, we shouldn't
make our standards depend on the implementation status of libraries,
but when presumably changing a standard, peeking at what everyone
does seems like a good idea (DaCHS' VOTable code doesn't do href at
all, btw).
|
My view was that if BINARY-with-href is useful at all (and really, I think it's hardly been used in the history of VOTable) the benefit is to put the metadata in a small file that can e.g. be manipulated in a text editor to edit the metadata, or ingested by a reader to look at the metadata, while the more unwieldy bulk data can go elsewhere. A more compelling story at the start of VOTable history was maybe VOTable metadata decorating an existing FITS file; but again as far as I'm aware this is hardly ever done. So I do basically agree with you that this is all a bit academic.
The way I'd expect a relative URI to be used here is without a scheme part ( it writes Since I'm too lazy to check ... can |
|
On Fri, Mar 06, 2026 at 01:50:19AM -0800, Mark Taylor wrote:
The way I'd expect a relative URI to be used here is without a
scheme part (`path-noscheme` production from RFC3986). If you
create an external-STREAM VOTable using STIL, e.g.:
```
stilts tpipe in=:test:3,b ofmt='votable(format=binary2,inline=false)' \
out=x.vot
```
it writes
```
<BINARY2>
<STREAM href="x-data.bin"/>
</BINARY2>
```
Since I'm too lazy to check ... can `astropy.io.votable` read the
file-pair that that generates?
No. The relevant code is (in astroy/vo/votable/tree.py):
vo_prot = ("http", "https", "ftp", "file")
if not href.startswith(vo_prot):
vo_raise(
f"The vo package only supports remote data through {vo_prot}",
self._config,
self._pos,
NotImplementedError,
)
…-- I have a few issues with this, but the upshot is that most
strings are rejected (you can probably get away with a filename like
httpmydata.bin, but the following urlopen call might thwart that).
Somehow it doesn't feel quite right to allow remote documents to
specify random paths into the file system, too. It might be fun to
think of attack scenarios ("can you make a VOTable that can read
/etc/passwd as a payload"?), but perhaps I'm worrying too much.
Anyway: If we want to keep href, I'd really like astropy and stil to
agree on what it means. "Really like" is probably strong enough that
I'll prepare a PR.
Would I implement "plain string means local file with a relative
path"? Probably.
Would I struggle for it if there's pushback from the astropy
community? Probably not.
|
Following discussion, and given that astropy.io.votable does not work with relative URIs in STREAM href attributes, remove the qualifier "absolute and relative" on the forms of URI that parsers are expected to be able to dereference. This would anyway have constituted a new expectation on parsers, i.e. one not explicit in earlier VOTable versions, and that wasn't the intention of this (tidying and modernisation) edit.
|
OK you convinced me. |
msdemlei
left a comment
There was a problem hiding this comment.
I'm half inclined to create an issue "write something clever on what to expect from STREAM/@href and what not to expect", but then I find I'm too lazy and this can wait until someone actually complains because they'd like to use it and it doesn't work.
So: fine with me, let's go
Replace outdated example URI schemes "httpg" and "ftp" in section 5.7 with "http" and "https". Also adjust text about what URI schemes parsers are expected to be able to read from.