Topic: [Feature] Disallow "blob:" source URLs at upload and edit time.

Posted under Site Bug Reports & Feature Requests

Requested feature overview description

When initially uploading a post, or when editing an existing post, disallow source URLs that have the blob: scheme in some way.

This might consist of quietly dropping the source, or dropping the source and adding a tag like invalid_source, or even refusing to accept the upload or edit until that URL is removed. It should probably follow whatever existing action (if any) is currently done for an "invalid" source.

Why would it be useful?

.blob: URLs can only ever point to something local to the user's browser (I think); they don't correspond to anything that anyone else can view or download from a server somewhere. About the only useful information they may contain is the top-level site that the image came from.

Disallowing the URL may alert an uploader or editor that they should find a Internet-visible link as a source, rather than a link that is local to their browser.

As an example, this post currently has blob:https://web.telegram.org/35ad76c5-5534-4190-9816-cc561baedb15 as a source. All it really tells you is that the image came from somewhere on Telegram - that's it. If you manually trim off the blob: and try to go to the resulting URL, you get a 404 error from Telegram.

What part(s) of the site page(s) are affected?

The upload page and the "edit" function on each existing post.

while were at it, can we also possibly prevent sources being a local file? as in something like C:\Users\user\funny_picture.jpg

dubsthefox said:
Wouldn't it work to do disallow anything that has no http or https in front of it?

Sometimes people will put in things like furaffinity.net/view/12345, leaving off the http or https. That's almost a valid URL, and it probably does tell you where to find the source image, so it might not be a good idea to reject it completely.

e621 also currently has a bug where it always attempts to use https for source URLs, even on old sites that only support http . One workaround for that is to put in the URL without the scheme, as seen here . It's a little more work for a user to follow the link, but at least they don't get a "not found" page or an infinite "loading" page.

There are other valid schemes that aren't http or https. Not many of them are used here, but this post has one, that points at an ISBN.

kora_viridian said:
Sometimes people will put in things like furaffinity.net/view/12345, leaving off the http or https. That's almost a valid URL, and it probably does tell you where to find the source image, so it might not be a good idea to reject it completely.

Giving the user an error saying that source need to start with https:// or something would give them the opportunity to fix it.

kora_viridian said:
e621 also currently has a bug where it always attempts to use https for source URLs

That's a purposely added feature, not a bug. It ensures that when users unnecessarily provide http sources, they get fixed to https, to avoid exposing other users to unencrypted connections and data transfers for certain kinds of content that can get them in trouble (e.g. a post for a sfw image having an http source link to a page that contains that image along with cub porn, when you live in an area that doesn't allow that). It's just a relative few number of source links that don't work with http, and it may be better to leave the onus with the user to remove the s themselves if they're okay with accessing the link with an unsecured connection, rather than letting users provide unsecured links.

kora_viridian said:
There are other valid schemes that aren't http or https. Not many of them are used here, but this post has one, that points at an ISBN.

That's not a valid URL and shouldn't be in the source field. It's not a valid link to anywhere, just information for the book, and information like the ISBN of a book is better put in the description.

watsit said:
Giving the user an error saying that source need to start with https:// or something would give them the opportunity to fix it.

dr-spangle wrote a script with a fix (protocols.MissingProtocol) that would take care of this specific case, but I don't know if they ever ran that script against the whole site. I also don't know if that particular fix is in the related pull request , but I kind of think it isn't.

That's a purposely added feature, not a bug.

If that's the logic behind it, can it be documented somewhere and made known to all the staff?

I first asked about it here . In that thread, @Lance_Armstrong noted that it had been brought up before, and that "I didn't notice any official response." After that, I posted in the bug report thread about it (no reply to date), and @faucet looked at the code and filed a Github issue about it (no reply to date).

That's not a valid URL and shouldn't be in the source field.

It might not be a valid URL, but but it is a valid URI . Maybe the upload form should specify whether it wants URLs or URIs. :D

kora_viridian said:
If that's the logic behind it, can it be documented somewhere and made known to all the staff?

I first asked about it here . In that thread, @Lance_Armstrong noted that it had been brought up before, and that "I didn't notice any official response."

Maybe I was getting confused about topic #15763. In either case, the fact that http:// sources get replaced with https:// has to be a deliberate act since the sources aren't otherwise modified, there has to be specific code there to do the replacement. There's even a bot that goes around replacing http links in descriptions with https.

kora_viridian said:
It might not be a valid URL, but but it is a valid URI . Maybe the upload form should specify whether it wants URLs or URIs. :D

The rules for tagging abuse currently have "Knowingly adding or editing a post source to an incorrect link", so sources should be (or have been at one time) a valid link. That wikipedia page says "URNs cannot be used to directly locate an item and need not be resolvable, as they are simply templates that another parser may use to find an item", i.e. not a link.

leomole

Former Staff

+1 disallow blob and local URL's.

+1 allow furaffinity.net/view/XXXXXXXX (then fix automatically) and don't force HTTPS. What an odd bug.

watsit said:
Maybe I was getting confused about topic #15763. In either case, the fact that http:// sources get replaced with https:// has to be a deliberate act since the sources aren't otherwise modified, there has to be specific code there to do the replacement.

It's in the part of the code that the Github issue talks about, specifically here . For reasons that are not clear, the FurAffinity and Pixiv source link handlers are (probably) getting called on every source link, not just ones that link to FA or Pixiv. Those two handlers both force https, presumably since both FA and Pixiv are known to support https. Since those handlers are called on every source link, though, they have the side effect of forcing https even for sites that don't support it.

There's even a bot that goes around replacing http links in descriptions with https.

...for a list of sites where https is already known to work. Go here and expand the "> Tagbot stuff", "> Currently tagging", and "> Swapping http:// to https:// in sources and descriptions for these urls:" collapse boxes to see the list.

The rules for tagging abuse currently have "Knowingly adding or editing a post source to an incorrect link", so sources should be (or have been at one time) a valid link.

Take it up with Riversyde , I guess... they added that source 11 years ago .

Edit: make e621 links local, rather than web

Updated

kora_viridian said:
It's in the part of the code that the Github issue talks about, specifically here . For reasons that are not clear, the FurAffinity and Pixiv source link handlers are (probably) getting called on every source link, not just ones that link to FA or Pixiv. Those two handlers both force https, presumably since both FA and Pixiv are known to support https. Since those handlers are called on every source link, though, they have the side effect of forcing https even for sites that don't support it.

That explains it, then.

Still, I'd argue it's better to assume URLs should use https, since user-provided links can be for anywhere and https provides better security and safety, not just for legal protection, but also protection from phishing when using look-alike URLs. It would be better to maintain a blacklist for sites that are known to not work with https, and skip changing just for those sites, as they will be the minority of URLs, rather than using a whitelist of sites that https is known to work with. A blacklist would be far smaller, simpler to maintain, and more easily kept up to date.

kora_viridian said:
Take it up with Riversyde , I guess... they added that source 11 years ago .

11 years was a long time ago, I imagine the rules were a bit different then.

watsit said:
but also protection from phishing when using look-alike URLs.

Not really. Nobody checks to see if your domain is an ASCII troll before they give you an SSL certificate, which makes https work. You could register furaffinnity.net, get an SSL certificate for it, and get hosting for it in an hour or two, for under US$50. You could then cause https://furaffinnity.net/view/anything to redirect to whatever you wanted to, or say "You need an FA GoldTM account to see this image, enter your credit card details now", etc.

A blacklist would be far smaller, simpler to maintain, and more easily kept up to date.

Mention it here or on the Github issue linked above. I just post pr0n other people have drawn; I don't have commit access to the repo. :D

kora_viridian said:
Not really. Nobody checks to see if your domain is an ASCII troll before they give you an SSL certificate, which makes https work. You could register furaffinnity.net, get an SSL certificate for it, and get hosting for it in an hour or two, for under US$50.

It may not be fool-proof, but it does help some. If someone's attempting to do a quick cash-grab, they may not bother with an SSL certificate, or the verification badge in the URL bar may help clue a potential victim that something's wrong.

kora_viridian said:
Mention it here or on the Github issue linked above. I just post pr0n other people have drawn; I don't have commit access to the repo. :D

I'm not sure a random post in the bug report thread will be noticed, compared to a thread that has a dedicated discussion about it. And I'd rather not expose my Github account by posting there (and making a separate account for that is too much of a hassle for me).

watsit said:
[trimmed]

I believe you mean a whitelist rather than a blacklist from what youre saying.
A blacklist is a block list, but otherwise allows.
A whitelist is an allow list, but otherwise blocks.

cutefox123 said:
I believe you mean a whitelist rather than a blacklist from what youre saying.
A blacklist is a block list, but otherwise allows.
A whitelist is an allow list, but otherwise blocks.

It depends on how you look at it I suppose, but either way, the opposite of how it's attempting to work now; rather than a list of sites to block http for (and change just those to https), do the change by default and have a list of sites to allow http to stay.

  • 1