Topic: Automatically remove tracking data from Discord image links

Posted under Site Bug Reports & Feature Requests

Discord recently made a change that automatically adds unnecessary data to cdn images whenever you hit Copy Link...

I want everything after the file extension removed, because that's how it was before and makes links longer for no good reason. As you can tell, the link still works perfectly without such data!

nigward said:
Discord recently made a change that automatically adds unnecessary data to cdn images whenever you hit Copy Link...

I want everything after the file extension removed, because that's how it was before and makes links longer for no good reason. As you can tell, the link still works perfectly without such data!

This is not useless data. Come next year, this data will prevent a link from working after it expires. Discord CDN is not a good place to "host" images anymore, and it never was.

Here's a copy paste of the post from the dev server:

๐Ÿ” New Authenticated Attachment URL Parameters
To improve security of Discord's CDN, attachment CDN URLs have 3 new URL parameters: ex, is, and hm. Once authentication enforcement begins later this year, links with a given signature (hm) will remain valid until the expiration timestamp (ex).

โš ๏ธ Attachment CDN URLs have already started following the new pattern, so your app will begin to encounter the new parameters in attachment CDN links, but authentication is not being enforced until later this year. More details about when authentication will start to be enforced will be shared in the upcoming weeks.

Details about authentication parameters
ex: timestamp indicating when the attachment URL will expire, after which point you'd need to retrieve another URL (by doing something like retrieving a message via HTTP). More details to come about the length of time this will be by default.

is: timestamp indicating when the URL was issued

hm: unique signature that remains valid until ex.

Handling authentication parameters
When links expire To access the attachment CDN link after the link expires, your app will need to fetch a new CDN URL. The API will automatically return valid, non-expired URLs when you access resources that contain an attachment CDN URL, like when retrieving a message.

The client behavior is not changing and will refresh posted URLs to be automatically valid, so your app doesn't need to worry about refreshing URLs itself if the link was valid at the time of posting.

In messages your app sends The behavior in the client will remain the same. Links posted in the client will be automatically updated if the link was valid at the time of posting, so you don't need to update your message's content when it includes an attachment CDN link (in cases like apps reposting images using CDN links).

If your app needs access to the content If your app requires access the content from an attachment CDN link, you should fetch the contents from the valid CDN link and upload them to a secure, independent host that your app maintains access to.

definitelynotafurry4 said:
This is not useless data. Come next year, this data will prevent a link from working after it expires. Discord CDN is not a good place to "host" images anymore, and it never was.

Here's a copy paste of the post from the dev server:

Wait, what? I didn't know any of this, it's even worse than I thought... I'm disgusted now!

nigward said:
Wait, what? I didn't know any of this, it's even worse than I thought... I'm disgusted now!

Disgusted? Why?
A chat program isn't, and should never be, considered a file host.
If people hadn't tried to use Discord as a backend for their personal projects, I doubt this change would have been implemented at all.

rakko said:
Disgusted? Why?
A chat program isn't, and should never be, considered a file host.
If people hadn't tried to use Discord as a backend for their personal projects, I doubt this change would have been implemented at all.

The real stupidity is people that used Discord like AGNPH or Baraag or something... Using it like a file host was bad, yeah. Oh, guess I need to start a project here to find alternate links or Wayback them.
source:discord
source:*discord
source:https://cdn.discordapp.com/
Ewwwwwww. 84 pages.

source:*discord -source:https://cdn.discordapp.com/ -source:*discordapp.net Still a bit of other domains.

Updated

ninitito said:
Centralizing internet + growing filesizes = a not so bright future for filehosts.

While this is true, using a messaging app as a filehost is pretty dumb.

alphamule said:
The real stupidity is people that used Discord like AGNPH or Baraag or something... Using it like a file host was bad, yeah. Oh, guess I need to start a project here to find alternate links or Wayback them.

This is the real dark magic:
https://github.com/fr34kyn01535/discord-fs

cinder said:
While this is true, using a messaging app as a filehost is pretty dumb.

This is the real dark magic:
https://github.com/fr34kyn01535/discord-fs

LOL, so obviously a bad idea. The real solution is to keep track of hashes and filesizes I guess, for long-term archival. Even Wayback machine is just a temporary measure in the 100-to-500-year view. Politics, war (again, politics), economy (ditto), format rot (where it gets harder to even read it - see Flash for a well-known example of a format undergoing this), the indexing problem (actually, the more serious one that e621 tries to address with it's tagging system), lack of interest leading to no one preserving the backups of the backups, ecological disasters damaging all copies at once (not likely a problem for optical media but they have other issues), or a bunch of other things to fill a book, will eventually cause loss of access and possibly permanently.

Web 4.0: Any file over 1MB is on something like IPFS.

So, some good questions: Should I be replacing Discord links with Wayback'd ones as I archive them? Just add a - mark since they'll be dead links in a few months, anyways (after archiving)? There's thousands of links to mirror. Should I just make a WARC/WACZ file and submit it instead of using Save Page Now? I don't have an account there, but maybe I should get one.

:edit: LOL, oops, here's the JSON results archived for source:https://cdn.discordapp.com/attachments/: https://files.catbox.moe/xzqd6a.7z There are other results for source:*discord*, as well.
I'll have to find a way to do this.

source:https://discord.com Another 16 pages?

source:*discord* -source:https://discord.com -source:https://cdn.discordapp.com/attachments/ Cute, some have Discord (MLP) in the filename. ;)
source:*discord* -discord* -source:https://discord.com -source:https://cdn.discordapp.com/attachments/
source:*discord* -discord* -source:https://discordapp.com -source:https://discord.com -source:https://cdn.discordapp.com/attachments/ OK, this is getting ridiculous, haha.
So far, domains used seem to follow these patterns: *discord.com* *discordapp* *discord.gg*

source:*discord* -discord* -source:*discord.com* -source:*discordapp* -source:*discord.gg* Seems to have caught all the exceptions? The stragglers seem to be "commission over Discord" or the like in sources. Non-URL sources or sources on say Tumblr posts with "commission" and Discord in the filename name, the lot of them.
discord_(mlp) ~source:https://discordapp.com ~source:https://discord.com ~source:https://cdn.discordapp.com/attachments/ One whole exception among that pile.
discord_(mlp) ~source:*discord.com* ~source:*discordapp* ~source:*discord.gg* Another with this pile.

Updated

Add archived_source if you use Wayback or Archive.Today.

And yeah, Telegram will probably do the same. If you archive your source URL, then this is not problemo. ;)

alphamule said:
So, some good questions: Should I be replacing Discord links with Wayback'd ones as I archive them? Just add a - mark since they'll be dead links in a few months, anyways (after archiving)? There's thousands of links to mirror. Should I just make a WARC/WACZ file and submit it instead of using Save Page Now? I don't have an account there, but maybe I should get one.

Don't remove links that were valid, even if they no longer are. They can sometimes be useful to find alternate sources. I wouldn't add a - mark to strikeout sources that are still valid right now.

watsit said:
Don't remove links that were valid, even if they no longer are. They can sometimes be useful to find alternate sources. I wouldn't add a - mark to strikeout sources that are still valid right now.

OK, they can technically be 'recovered' if you have an active Discord instance. It's not the end of the universe, at least. I've been just marking them archived source as I archive them. Mostly for reasons you mentioned.

I'm still wondering on those questions. I think the best approach is to just create a huge JSON file(or bunch of them), like I have been, I guess. From what I saw, you can automate creation of the archive as you browse. It'll record the traffic of the browser used. Still, that's a huge list. Wonder if we should assign ranges.

source:https://cdn.discordapp.com/ -archived_source herm Currently working on this one. Meh, can just pick a page from source:https://cdn.discordapp.com/ -archived_source and add tag+archive sources if you feel like volunteering. Conveniently, excluding archived source means no duplicate work.

Updated

ShadyGuy said:

Hello, archived_source is used for when the only live source for a post is an archive site like archive.org or 4chanarchives.
Wiki: archived_source

Cheers

alphamule said:
Oh, dang it. Makes no sense to have both tags mean the same thing, sigh... I mean, overlapping the concepts of it being unavailable at source, and also archived. Now I'm wondering what tag I was supposed to use for that.

ShadyGuy said:
Actually, they're mutually exclusive:
Source is live - no tag
Source is only available through archive sites - archived_source
Source is not available at all - unavailable_at_source

Cheers

alphamule said:
Well, that makes it impossible to know if an existingly available has been archived. Wording is kind of misleading, which how I made that mistake. Now I need to go through my history and edit them. The meaning seemed too obvious so I didn't even question it enough to use the Wiki. :facepalm:

ShadyGuy said:

alphamule said:
Is it OK to quote these messages? Someone's asking about topic 40740 and I was going to mention that why it's wrong use.

Sure, go ahead!

Had some questions from DiligentDragon that I think would be best answered here.

Basically, we need a way to note that the link as of the time of the archive was archived, for expiring links. Linking to the message has it's own issues. I'm waiting on permission to copy the message I got.

*bump* Got a response:

DiligentDragon said:

alphamule said:

DiligentDragon said:

Hello alphamule:

I have been considering replacing a significant amount of GeoDat64 current Twitter JPG uploads with PNG obtained from their Discord server. I was initially researching which source link is preferred, whether it should be a direct link to the message itself or to the cdn attachment link. While doing so I came across this forum topic you posted in -> https://e621.net/forum_topics/40740. It seems that using Discord as a source comes with some significant issues long term.

I want to make sure I'm doing this properly and not creating issues for the site. I was planning on reaching out to Mairo but they are perpetually busy. You seem very informed on this topic, could you summarize what steps I should take to archive the source link when uploading/replacing from Discord?

Post in question -> https://e621.net/posts/4334245

Thank you.

Sincerely,
DiligentDragon

Is it OK to move this to that forum topic? This is probably worth public view. ShadyGuy pointed out that it's wrong use of that tag. :(

Sorry for my late response, stuff came up.

Of course we can move this to the forums.

So, yeah, given that the tag is bound to be misused, I've been using a set for specific site like set:archivedotorg and set:archivedottoday for my own sanity. I probably need to go through my edit history to remove it from posts where the source is still existent?
The entire reason I started using that tag is sources like Discord that technically ARE correct use of the tag since they are ephemeral (expiring in a day!), if I remember right. That or Twitter/ImgUr/others that are likely to remove the source soon enough. Was I jumping the gun too much by applying it already? I guess so. XD

Due to reasons, I usually add both sources now. There are plenty of cases where one link fails but the other works. It also makes it trivial to know which hot link was archived. Like, on InkBunny, you can technically use a bunch of different equivalent domains, because they have regional DNS entries. Switching between them is normally transparent and works fine, but not if you're expecting Wayback to direct to the archived copy. Note that sites like Pixiv require a referrer set to view specific images from a source. So, the ones that have "master" or "original" or "JPG" or whatever in the URL, not the illustration ID or artist ID.

If you link to a Discord image link and it expires, you can refresh it prior to archiving a live link, using the client. Loading it in that will replace the session ID+key transpare-magically. Same rule applies to Deviant Art. Those links have to be archived IMMEDIATELY to be of any real use.

To ask the question about steps:
1) Get the relevant source link/s.
2) Archive the ephemeral (expiring) link.
3) Make sure that said Discord links are actually superior to other available sources. Added this step but it should be obvious. Resolution and quality are not always equivalent, nor is lossless always better. If a PNG has JPEG artifacts and all other sources are lossy with largest filesize JPEG having same exact artifacts, then someone saved a JPEG as PNG. :facepalm:
4) Add relevant links and tag the post as archived_source, since they'll die automatically in a day. Yes, technically this is mistagged for 24 hours, but otherwise it will get forgotten and make future redundant work for others.

Note that I haven't checked that Discord is actually enforcing that, yet. It's 'supposed' to be, by the first of the year, but maybe they postponed it. Yeah, seems to not be enforced. What a bother.

Updated

It may be worth sending this topic along to someone who can update the howto:sites and sources wiki page and this wiki section -> https://e621.net/wiki_pages/31895#discord. It may make sense to update those areas to include this information.

This also exposes an issue with replacements sourcing, I can't put more than 1 link in the replacements, which means its going to take additional work to add in the other links for any approved replacement.

If Discord ever starts enforcing the expiry a significant amount of source links will be invalidated all at once, big headache right there.

I guessed pages that important would be locked. I still suggest passing this topic along to a member of staff for their opinion.

With approval I think alphamule should update the discord specific section to include the expiry possibility and recommend the steps to archive discord uploads. We should also consider updating archived_source to include language explaining that an automatically expiring link should have an archived link added as well. Does anyone have an alternative view or another wiki page that could use an update?

Updated

diligentdragon said:
I guessed pages that important would be locked. I still suggest passing this topic along to a member of staff for their opinion.

With approval I think alphamule should update the discord specific section to include the expiry possibility and recommend the steps to archive discord uploads. We should also consider updating archived_source to include language explaining that an automatically expiring link should have an archived link added as well. Does anyone have an alternative view or another wiki page that could use an update?

It gets worse: I used archived_source 1000's of times, and now have to move them to a set but NOT if the source died? I so, sooooo hate that redundant/overlapping definition/use on that tag! In no way does "archived source" imply that the source is unavailable. Given the existence of unavailable_at_source, this is just weird. How many people made same mistake?

Updated

  • 1