Topic: We need to have a discussion about TagBot.

Posted under General

allow me to bite the bullet and discuss TagBot, this new editing bot which popped up last month without any forum threads or announcements discussing its arrival. the owner is asw_xxx, a user who i am personally unfamiliar with and yet will make a good faith assumption in their capabilities as both a contributor and a coder.

let it be known i am happy with letting a single, competent editing bot run loose to deal with all the garbage that human editors shouldn't have to. they do wonders for most things they touch, and take out the needless busywork that plagues projects of this size and capacity. given how e621 is the largest furry booru in the world, it is astonishing there weren't any bots until now.

but i do have a few problems with the implementation. for one, the lack of fanfare, or indeed any communication, given the bot's arrival. i have not seen any forum posts or blips about tagbot, no banners about the brand new robot being deployed, no recommendations for the average user... nothing.

if this was an admin decision, why weren't the users notified about their brand new friend and tool? why keep such a thing in secrecy? i understand the administration of every website has a certain amount of tact and privilege of what they choose to divulge to us ordinary users. but given the wide impact of such a decision, i cannot see why it should be withheld from us.

and if this is not an admin decision, instead letting there be a bot running amok one day without permission, why even allow the capability for bots to exist given their capabilities? though given TagBot has the rank of "privileged", i am assuming the decision to allow the bot to exist comes from somebody in power who, once again, did not express such to the users.

not, of course, to make assumptions or rake muck. but i don't have much information either way, so i apologize for any mistakes or half-truths that infect this forum post.

my second problem, you see, comes from the lack of public source code availability. anybody who works in computer security or open source software will understand the fundamental flaws of allowing proprietary code to run publicly-available infrastructure.

but without getting into details, i will say that allowing a machine to edit a community-run site without allowing the community to discern how, exactly, that machine is editing the site, is a disservice to the entire community, and i would even say it is an insult to the community who has built this site from the ground-up over months and years.

now, on Wikipedia at least, the administrators of the site encourage the release of the source code of bots, and from what i've seen the majority of bots do have their source released. given that Wikipedia is a community effort where bots have dramatic and wide changes on everything the userbase comes into contact with, it is only fair to let those users view how the bots work.

given that TagBot has now edited 64,000 posts with little oversight or understanding of the processes used to determine what it edits, i recommend the simple policy of having every bot release their source code for viewing by every user, so that every user can have some level of basic oversight over this automated editor. important, considering just how many posts it has edited.

there is a difference between a commercial program used for the sole intent of generating profit and a program that has a significant impact on an entire community, even if that impact manifests in very small individual tag edits. there is no reason not to release the source code.

indeed, any closed-source project represents a security risk to anything that runs such projects, seeing as you have no understanding of what code is being executed at what time. no matter how benign the present output, that is no guarantee that any later output will be just as benign. i happen to enjoy this website, and would appreciate if nothing happened to it because of some silly bug or malicious feature.

my third concern, in a meta sense, is that some users will brush this off as "no big deal". in a practical sense, i get it. nothing bad has happened so far, with this flawed robot, so why worry? but such an attitude banks on absolutely nothing bad happening in the future, ever. and i can't consider this a professional or fair attitude for anybody.

in particular lines of work, such as emergency services, small details can kill. a faulty address, a piece of broken equipment, calling someone by the wrong name, not having a proper plan, not having a backup plan, not executing that plan properly, and so on...

i am happy many of you have the privilege to make small mistakes and overlook small details without worrying about anybody dying. you can make a typo in a tag and fix it up lickety-split. accidentally add in the wrong source only to fix it a few seconds later. even with posts, you can request they be deleted and have the admins give you a do-over. to err is human, et cetera.

but a typo in a bot that has edited, once again, over 64,000 tags in a month is one that could either fail gracefully, or end up mistagging tens of thousands of posts with no easy way to clean up the mess. little details matter in the construction of such large-scale projects, and the points i have recommended, though simple, are far from being small enough to simply ignore. and if they were, their smallness is no reason to ignore them.

once again, if my information is wrong, please take into consideration the rest of this opinion sans the faulty info. at the end of the day, i only have my experiences to bank on, and they may very well be wrong.

tl;dr: read the post. but in sum, please release TagBot's source code, be more communicative about bot additions, and focus on the future instead of the now.

Updated by Lance Armstrong

Ratte

Former Staff

I wiped my ass at 9:49pm on monday, September 25th, 2017.

Updated by anonymous

Ratte said:
I wiped my ass at 9:49pm on monday, September 25th, 2017.

*furiously takes notes*

Updated by anonymous

funny comments aside, op does make a solid point: while i lov tagbot, it would probably be best to release its source code and be more open with its existence

Updated by anonymous

facelessmess said:
it would probably be best to release its source code and be more open with its existence

Why?
If they don't want to release source, that's their issue. All tag edits done are objectively correct, so they have much better track record than any regular user, so it's not problem.

Updated by anonymous

Mario69 said:
If they don't want to release source, that's their issue. All tag edits done are objectively correct, so they have much better track record than any regular user, so it's not problem.

actually, it's our issue. the lack of source code availability affects every user who wants to inspect the source code of the machine who is now doing their job.

with closed-source software, it is impossible to guarantee that current correctness will always be correct. and to the point, it goes against everything a community stands for: open cooperation.

Freedom includes the freedom to cooperate with others. [...] In the free software community, we are very much aware of the importance of the freedom to cooperate because our work consists of organized cooperation. If your friend comes to visit and sees you use a program, she might ask for a copy. A program which stops you from redistributing it, or says you're “not supposed to”, is antisocial.

Updated by anonymous

Mario69 said:
Why?
If they don't want to release source, that's their issue. All tag edits done are objectively correct, so they have much better track record than any regular user, so it's not problem.

mainly because of op's point here

my second problem, you see, comes from the lack of public source code availability. anybody who works in computer security or open source software will understand the fundamental flaws of allowing proprietary code to run publicly-available infrastructure.

i'm not saying it isnt a good contributor, its been objectively correct so far and is very helpful. i think op is more concerned over possible issues when it comes to the bot possibly being misused or abused somehow to be destructive, so being open source over that might help in deterring any sort of malicious use of the bot

Updated by anonymous

facelessmess said:
mainly because of op's point here

i'm not saying it isnt a good contributor, its been objectively correct so far and is very helpful. i think op is more concerned over possible issues when it comes to the bot possibly being misused or abused somehow to be destructive, so being open source over that might help in deterring any sort of malicious use of the bot

I'm not understanding the logic train here. If one user has access to a bot and it is available only under a single username. Then the malicious actions are both restricted to that account and under public scrutiny as they are easily located. If the source is released, then ANYONE can run it, and the number of malicious actors capable of using it for malicious actions is infinitely larger. I see no benefit to releasing the source code for review because it is a single user operating it. It poses no security threats to anyone but the user running it.

Even if it was open sourced, it has no impact on the malicious use of it. If the author uses it for malicious purposes, it is used for malicious purposes, regardless of public scrutiny at how it works.

I'd honestly encourage them not to release the source code, and instead continue to be objective, and reviewed by the public, as all other edit actions are done.

Updated by anonymous

KiraNoot said:
I'm not understanding the logic train here. If one user has access to a bot and it is available only under a single username. Then the malicious actions are both restricted to that account and under public scrutiny as they are easily located. If the source is released, then ANYONE can run it, and the number of malicious actors capable of using it for malicious actions is infinitely larger. I see no benefit to releasing the source code for review because it is a single user operating it. It poses no security threats to anyone but the user running it.

Even if it was open sourced, it has no impact on the malicious use of it. If the author uses it for malicious purposes, it is used for malicious purposes, regardless of public scrutiny at how it works.

I'd honestly encourage them not to release the source code, and instead continue to be objective, and reviewed by the public, as all other edit actions are done.

eh I suppose I can see that

still, is it possible we can at least have it acknowledged in the wiki or something? like a simple comment that it does exist? it'd probably be helpful for people to know that a bot can fix what they missed and didn't realize they missed, so that they don't worry about it later.

though perhaps i'm thinking about this too personally; mental stuff makes me over worry a lot about the smallest things ahaha

Updated by anonymous

A bot like that is simple, you can make your own.
Just inspect every post and add missing tags based on media data.

Updated by anonymous

The bot wasn't discussed because there's objectively nothing to discuss. The thing is sanctioned to only run tag edits that are objectively correct and have no room for interpretation or subjectivity.

We could have made a routine on the server to tag that metadata correctly on upload, but this option works just as fine with less coding input from our end. This is not comparable to a wikipedia bot that creates or removes information, ours just transcribes information from metadata of the upload into the tag section.

If the bot runs amok, which is highly unlikely, the damage is contained within a single account and completely trivial to undo with our admin tools.
To use your own example, our bot doesn't touch the patient, it collects, archives, and tags reports after the patient has vacated the hospital already. The room for errors is minimal, the results trivial to verify, and if an error does happen it's a matter of minutes to fix them.

I also agree with Kira's assessment on the publishing of the source code, it really shouldn't be done. If it were a public bot or more complex than it currently is then absolutely. But in its current scope and implementation opening the code would be able to cause far more problems than leaving it closed.

Updated by anonymous

fewrahuxo said:
allow me to bite the bullet and discuss TagBot, this new editing bot which popped up last month without any forum threads or announcements discussing its arrival. the owner is asw_xxx, a user who i am personally unfamiliar with and yet will make a good faith assumption in their capabilities as both a contributor and a coder.

let it be known i am happy with letting a single, competent editing bot run loose to deal with all the garbage that human editors shouldn't have to. they do wonders for most things they touch, and take out the needless busywork that plagues projects of this size and capacity. given how e621 is the largest furry booru in the world, it is astonishing there weren't any bots until now.

Nimmy had a bot that added resolution tags that ran under him(I think it was tag scripting though?)

fewrahuxo said:
but i do have a few problems with the implementation. for one, the lack of fanfare, or indeed any communication, given the bot's arrival. i have not seen any forum posts or blips about tagbot, no banners about the brand new robot being deployed, no recommendations for the average user... nothing.

They could have asked a admin via DMail if it was ok, or asked in IRC.
We don't need banners about third-party programs doing stuff to the site because that will endorse them.
Don't get me wrong, we love to see what people come up with to assist on the site. But we cannot endorse these things because if something breaks, fingers may be pointed at us saying "YOU SUPPORTED IT!".
The bot has enough details in it's description(and name for that matter), users don't need to know about it because it doesn't interact with users in any way other than tagging stuff people didn't tag.

fewrahuxo said:
if this was an admin decision, why weren't the users notified about their brand new friend and tool? why keep such a thing in secrecy? i understand the administration of every website has a certain amount of tact and privilege of what they choose to divulge to us ordinary users. but given the wide impact of such a decision, i cannot see why it should be withheld from us.

We provide a publicly available API. There is no secrecy. In fact there are a hand full of bots on the site running right now. Some do stuff you don't see by assisting users with third-party sites, such as a favourites recommender.
There are even some bots out there that we don't know who it is, because they forge their user-agent(which is heavily frowned upon). Even some that scrape content(both tags and the images, stealing the work of our users) who then upload to another booru(you know who you are).

fewrahuxo said:
and if this is not an admin decision, instead letting there be a bot running amok one day without permission, why even allow the capability for bots to exist given their capabilities? though given TagBot has the rank of "privileged", i am assuming the decision to allow the bot to exist comes from somebody in power who, once again, did not express such to the users.

We gave it privileged access so it can tag stuff without hitting limits. It isn't breaking any rules, and as I stated above, we released a API for the exact purpose of allowing people to make neat things.

fewrahuxo said:
not, of course, to make assumptions or rake muck. but i don't have much information either way, so i apologize for any mistakes or half-truths that infect this forum post.

my second problem, you see, comes from the lack of public source code availability. anybody who works in computer security or open source software will understand the fundamental flaws of allowing proprietary code to run publicly-available infrastructure.

What are people going to do? Crash the bot by hacking the image size value on a image before they upload it? If anything it'll just say it is a really large image, and so will our site.
And if you think about it, Ouroboros(the codename for the fork of Danbooru the site runs on) is actually not publicly available.

fewrahuxo said:
but without getting into details, i will say that allowing a machine to edit a community-run site without allowing the community to discern how, exactly, that machine is editing the site, is a disservice to the entire community, and i would even say it is an insult to the community who has built this site from the ground-up over months and years.

I am confused about why the community explicitly needs to know about something the site can do automatically(but we don't want to because it means making programmically added tags that might not be correct if we make another site running on Ouroboros).

fewrahuxo said:
now, on Wikipedia at least, the administrators of the site encourage the release of the source code of bots, and from what i've seen the majority of bots do have their source released. given that Wikipedia is a community effort where bots have dramatic and wide changes on everything the userbase comes into contact with, it is only fair to let those users view how the bots work.

While some people have different views on how source codes should go(I for one, love open source stuff), some people don't want to, and we don't make them release the source code.
There is even a C++ based executable someone made for windows which runs on user's computers which does not have it's source code publicly available(which I find HIGHLY suspicious), but it hasn't caused any problems.

fewrahuxo said:
given that TagBot has now edited 64,000 posts with little oversight or understanding of the processes used to determine what it edits, i recommend the simple policy of having every bot release their source code for viewing by every user, so that every user can have some level of basic oversight over this automated editor. important, considering just how many posts it has edited.

It's very easy to understand. You use the API(/post/show.json?id=<post_id>), get the width and height, if width or height is in a specific range, add a specific tag, if it is out of that range, add a specific tag.

fewrahuxo said:
there is a difference between a commercial program used for the sole intent of generating profit and a program that has a significant impact on an entire community, even if that impact manifests in very small individual tag edits. there is no reason not to release the source code.

What if that significant impact is beneficial?

fewrahuxo said:
indeed, any closed-source project represents a security risk to anything that runs such projects, seeing as you have no understanding of what code is being executed at what time. no matter how benign the present output, that is no guarantee that any later output will be just as benign. i happen to enjoy this website, and would appreciate if nothing happened to it because of some silly bug or malicious feature.

There is very unlikely chance there will be a bug, because it is just math that has been tested. I could write the script in 10-20 minutes without making a single error. Programmers are very good at doing basic math tasks(except when they follow w3schools).
If they go malicious, we ban the bot and the owner. It takes about 5 minutes to revert all their changes.

fewrahuxo said:
my third concern, in a meta sense, is that some users will brush this off as "no big deal". in a practical sense, i get it. nothing bad has happened so far, with this flawed robot, so why worry? but such an attitude banks on absolutely nothing bad happening in the future, ever. and i can't consider this a professional or fair attitude for anybody.

in particular lines of work, such as emergency services, small details can kill. a faulty address, a piece of broken equipment, calling someone by the wrong name, not having a proper plan, not having a backup plan, not executing that plan properly, and so on...

But.. we are not a emergency service? If anything breaks we can just bulk revert a user's changes, or in worst case scenario, revert the database an hour back.

fewrahuxo said:
i am happy many of you have the privilege to make small mistakes and overlook small details without worrying about anybody dying. you can make a typo in a tag and fix it up lickety-split. accidentally add in the wrong source only to fix it a few seconds later. even with posts, you can request they be deleted and have the admins give you a do-over. to err is human, et cetera.

but a typo in a bot that has edited, once again, over 64,000 tags in a month is one that could either fail gracefully, or end up mistagging tens of thousands of posts with no easy way to clean up the mess. little details matter in the construction of such large-scale projects, and the points i have recommended, though simple, are far from being small enough to simply ignore. and if they were, their smallness is no reason to ignore them.

once again, if my information is wrong, please take into consideration the rest of this opinion sans the faulty info. at the end of the day, i only have my experiences to bank on, and they may very well be wrong.

I have a higher risk of stubbing my toe on my ceiling fan than the bot breaking, and I haven't seen the source code my self.

fewrahuxo said:
tl;dr: read the post. but in sum, please release TagBot's source code, be more communicative about bot additions, and focus on the future instead of the now.

[/quote]
I'm sure if you ask them they might consider giving you a copy of the source code.
My problem is, if they release it, people will download it and run it for the sole intent of artificially inflating their tag count by doing nothing. It's much better to have a single bot account do it so we know it was all automatic, than someone getting privileged for doing nothing then causing havoc.

Updated by anonymous

If the bot breaks due to an error, presumably the creator could modify the bot to go through its own history, create a set of potentially affected images, and then revert the changes it made.

Sure, it would be an unfortunate thing to have to do, but it would be a reasonably easy way to clean up the mess.

Updated by anonymous

Chaser said:
Nimmy had a bot that added resolution tags that ran under him(I think it was tag scripting though?)

Wasn't me, I also can't remember who it was, it's been too long.

Clawdragons said:
If the bot breaks due to an error, presumably the creator could modify the bot to go through its own history, create a set of potentially affected images, and then revert the changes it made.

Sure, it would be an unfortunate thing to have to do, but it would be a reasonably easy way to clean up the mess.

We can also quickly revert hundreds or thousands of tag changes from our side. If there is fallout it's fixed in a matter of minutes.

Updated by anonymous

NotMeNotYou said:
We can also quickly revert hundreds or thousands of tag changes from our side. If there is fallout it's fixed in a matter of minutes.

Well that's useful.

In a sense, anyway. Hopefully it's not something that needs to be used too frequently, for obvious reasons.

Updated by anonymous

A few questions on the TagBot tag list:

  • no_sound/sound/sound_warning

What is the planned metric for sound_warning vs sound? Normalize the sound track, downsample it, take a coarse histogram of volume bands, and sum the top few bands, compare that sum to a threshold?

  • pixel_(artwork)

Really interested in how this will be detected. I've worked on the problem of detecting PA and don't currently believe it can be reliably distinguished from oekaki, given the relatively lax standard for 'pixel-art-ness' on this site.

(that's why the wiki page I wrote for pixel_(artwork) contains so much hedging ;)

Updated by anonymous

i have read the responses to this thread. their dissection would require a retelling of the fundamental principles of free software and the reasons why it is so important, which tells me that very few of you care about the philosophy, or else you would understand my point of view. as Louis Armstrong has said: "there are some folks that, if they don't know, you can't tell them", and i doubt preaching the philosophy would help you come to terms with it.

i have come to the conclusion that there is a cultural divide between those who understand the social and security implications of allowing these robots to run amok without letting anybody understand, how exactly, they are running amok, and that the divide between free software and non-free software starts at the very first instant a site is founded. i have found you can't argue with culture. you can only run away, join another culture, or else be extremely patient and wait for it to change over time. i, myself, do not have the power to change the majority opinion, no matter how much discourse i provide.

i find it ironic that a website which relies so much on the work of dozens of volunteers does not want to give back to those volunteers by allowing them to understand the processes which make it tick, include them in the discussion when a new editing bot launches, or even have the politeness to announce when a bot has been launched as opposed to letting them discover it by themselves. the reasoning doesn't matter, at that point. it's just another showcase for the disconnect between those who run the site and those who don't.

but i can see you have already made up your minds, and for what it's worth the website in itself is insecure for collecting the sensitive data of thousands of individuals and putting in the hands of a small minority who we all pray knows what they're doing with it. please consider this my last whining post on the subject matter, and i won't argue any further.

Updated by anonymous

fewrahuxo said:
for what it's worth the website in itself is insecure for collecting the sensitive data of thousands of individuals and putting in the hands of a small minority who we all pray knows what they're doing with it.

For users peace of mind, I'm going to clarify that we don't "collect" data. I don't think our as service collects data either, other than click count. I never noticed any cookies related to it.
We do store usernames, hashed passwords using bcrypt(I think?), and email, but nothing more than the user provides.
I don't know where you got the idea we store sensitive information?

If you want insecure, there's a specific site which I wont name that has more security holes than Swiss cheese, and they have way more users than us(I think, I don't have the analysis data(this is anonymous data)).

Updated by anonymous

savageorange said:
A few questions on the TagBot tag list:

  • no_sound/sound/sound_warning

What is the planned metric for sound_warning vs sound? Normalize the sound track, downsample it, take a coarse histogram of volume bands, and sum the top few bands, compare that sum to a threshold?

  • pixel_(artwork)

Really interested in how this will be detected. I've worked on the problem of detecting PA and don't currently believe it can be reliably distinguished from oekaki, given the relatively lax standard for 'pixel-art-ness' on this site.

(that's why the wiki page I wrote for pixel_(artwork) contains so much hedging ;)

Those doesn't seem to be implemented in any way yet, but as far as I can see with sound, the bot has done pretty good job and has ability to check if video containing audio track has the track empty and not just detecting that if the audio track is missing.

I have done this manually in the past, so neat to have someone automate this so I have time elsewhere.

fewrahuxo said:
i have read the responses to this thread. their dissection would require a retelling of the fundamental principles of free software and the reasons why it is so important, which tells me that very few of you care about the philosophy, or else you would understand my point of view. as Louis Armstrong has said: "there are some folks that, if they don't know, you can't tell them", and i doubt preaching the philosophy would help you come to terms with it.

i have come to the conclusion that there is a cultural divide between those who understand the social and security implications of allowing these robots to run amok without letting anybody understand, how exactly, they are running amok, and that the divide between free software and non-free software starts at the very first instant a site is founded. i have found you can't argue with culture. you can only run away, join another culture, or else be extremely patient and wait for it to change over time. i, myself, do not have the power to change the majority opinion, no matter how much discourse i provide.

i find it ironic that a website which relies so much on the work of dozens of volunteers does not want to give back to those volunteers by allowing them to understand the processes which make it tick, include them in the discussion when a new editing bot launches, or even have the politeness to announce when a bot has been launched as opposed to letting them discover it by themselves. the reasoning doesn't matter, at that point. it's just another showcase for the disconnect between those who run the site and those who don't.

but i can see you have already made up your minds, and for what it's worth the website in itself is insecure for collecting the sensitive data of thousands of individuals and putting in the hands of a small minority who we all pray knows what they're doing with it. please consider this my last whining post on the subject matter, and i won't argue any further.

Next time you come into IRC you start complaining that you want to know how poofbot decides what to give the users on the channel, because it's crucial for the channels operation.

FOSS is really good thing, but you have stuck your head way too deep into that bucket. We are still talking about one single individuals own created bot, which runs with same restrictions as other users do and we have full knowledge who runs it and full power to ban them and revert their changes if it becomes necessarity. Thus far the tag changes made have been correct, so I have absolutely no idea where the sudden demand comes from? You don't start demanding artists for PSD/XCF/SAI because their artwork is being posted here, because you simply have to have access to way that created the end product either.

There simply are things that people either want to or need to keep private. I also see bots running on many sites automating some things that would be pain to do by hand, only difference I think is some sites have seperate user level or label for bots. This is the first time I see someone almost aggressively demanding to know the source code for said bots.

And it's almost funny how you are trying to turn contributors mind into doubt by claiming that site itself not being FOSS suddenly makes their contributions far less valuable or site less secure. No. Pretty sure everything is stored in databases where data can be reused even if the site itself fails. How many banks or social media sites give out their source code? FA does have track record of having insanely bad code, but only time I have seen anything over here was the cloudflare related issue which was out of anyones hands and effected thousands of sites.

Updated by anonymous

Chaser said:
For users peace of mind, I'm going to clarify that we don't "collect" data. I don't think our as service collects data either, other than click count. I never noticed any cookies related to it.

actually i've noticed a security flaw where the login cookie reveals the username of the user which is logged in, meaning anybody with access to your Web browser will know you have an account on e621, as well as exactly what account you've logged in as.

We do store usernames, hashed passwords using bcrypt(I think?), and email, but nothing more than the user provides.
I don't know where you got the idea we store sensitive information?

every website has the capability to collect more data than it knows what to do with. it's fair to assume every website collects such data until proven otherwise, and this can never be truly proven under any model due to the private nature of Web infrastructure.

If you want insecure, there's a specific site which I wont name that has more security holes than Swiss cheese, and they have way more users than us(I think, I don't have the analysis data(this is anonymous data)).

no need to name it. most websites are insecure by default and every attempt to make them secure is flawed in their own special way.

Updated by anonymous

fewrahuxo said:
actually i've noticed a security flaw where the login cookie reveals the username of the user which is logged in, meaning anybody with access to your Web browser will know you have an account on e621, as well as exactly what account you've logged in as.

If they have access to your computer, you have bigger problems at hand. Also even if we remove it(honestly no idea why it is there, maybe a relic of the past?), They can still see you accessed e621, and visit the site, then look at who is logged in.

fewrahuxo said:
every website has the capability to collect more data than it knows what to do with. it's fair to assume every website collects such data until proven otherwise, and this can never be truly proven under any model due to the private nature of Web infrastructure.

Not going to argue there. Stuff like this has happened on other sites before. But the internet runs on a trust basis. If someone gives you a reason not to trust them, you don't use the site. Other times, if your not sure, you use a junk email and fake info until you have reasons to trust them. Also Google and Facebook both store way more than we do, yet they are not open source.

I have contributed various stuff to open source things before. My involvement with e621 is publicly available because I trust e621 not to do bad things.
Basically what I am saying, is I trust to keep user data safe and not collect more than what I have seen I the code(which I have mentioned above).
My real name is in the code(somewhere, I forget what had it, but it is in there). If e621 does do something bad, my name will be tied to it, and it will destroy my career as a software/web developer, every contribution I made to open source will be heavily scrutinised, future pull requests will be under heavy judgement and possible rejection just because of past events. This much I trust e621/Dragonfruit.

Updated by anonymous

savageorange said:
A few questions on the TagBot tag list:

You'll have to ask asw_xxx for the details.

fewrahuxo said:
i have read the responses to this thread. their dissection would require a retelling of the fundamental principles of free software and the reasons why it is so important, which tells me that very few of you care about the philosophy, or else you would understand my point of view.

It is entirely possible to understand something and still disagree with it, as happens here. How come whenever people disagree with you you just dismiss their arguments instead of actually arguing against them?

The bot wasn't announced because it's scope and capabilities are completely trivial. What even would have been there to discuss with the bot? How tagging sound or no_sound is harmful? If <30_seconds_webm should be applied to posts that qualify or not?
There are no downsides to this bot, all tags added are intricately linked to the raw file of the submission. Any given tagged attribute is either true, or not. There is neither room for errors, nor is there room for subjectivity.

We have had requests for more sophisticated neural network based tagging bots, which I've denied so far. People are allowed to use our publicly available tag database as a learning ground to play around with the tech, but we're not going to let one like that on our page any time soon (if at all). A tagging bot like that would definitely require a large scale discussion on the issue before I'd even consider giving it a green light.

FOSS in this case is also quite irrelevant, it's not a program you have access to, use, or interact with. It doesn't have access to user data or to e6 in any way you don't have, it communicates through our API, downloads the file, analyzes it, then uses the API again to update the tags based on attributes found. The information that bot is able to collect and process is the same any anonymous person can access without an account.
On the chance that the bot does cause havoc, it's a matter of minutes for us to fix those errors like they never happened at all (apart from leaving traces in the tagging history of yore).
If it were a bot being run on our servers it would definitely need to be open source, we're not allowing code to run on our hardware where we don't know what it's doing.

Updated by anonymous

NotMeNotYou said:
It is entirely possible to understand something and still disagree with it, as happens here. How come whenever people disagree with you you just dismiss their arguments instead of actually arguing against them?

in a thousand words i have discussed my opinions on the matter in a way that would be agreeable to anybody who was undecided on the matter. but it is clear to me that anybody with the power to enact my proposed policies are not willing to agree with me, or else you would have already agreed with the initial post.

to argue, then, is having both parties reinforce their own opinions, because at this point nobody will be convinced of anything contrary to their own opinion. it was my burden to state candidly my view of things and some positive suggestions to improve those things, and having stated my view, my obligation is finished. your obligation was to come to a decision, and you have decided against me. there's nothing else to say at this point.

Updated by anonymous

NotMeNotYou said:
We have had requests for more sophisticated neural network based tagging bots, which I've denied so far. People are allowed to use our publicly available tag database as a learning ground to play around with the tech, but we're not going to let one like that on our page any time soon (if at all). A tagging bot like that would definitely require a large scale discussion on the issue before I'd even consider giving it a green light.

Theoretically, a bot like that could be made without your knowledge and without even using the API. It's unlikely that the maker would not talk to the staff about it or that its actions would go unnoticed, but it could happen.

Updated by anonymous

Lance_Armstrong said:
Theoretically, a bot like that could be made without your knowledge and without even using the API. It's unlikely that the maker would not talk to the staff about it or that its actions would go unnoticed, but it could happen.

Not going to disclose how, but we have methods to prevent this.

Updated by anonymous

fewrahuxo said:
but it is clear to me that anybody with the power to enact my proposed policies are not willing to agree with me, or else you would have already agreed with the initial post.

Why would anyone agree with the OP as a whole?

You yourself say that you have 3 concerns, and the TLDR proposes multiple measures. And within your description of those concerns, you import other assumptions, some of which raise red flags (eg. FOSS as an untrammeled good for security. Nothing is an untrammeled good.).

You top that off with framing those who disagree with you on important aspects of your argument as ignorant (which calls into question the sincerity of your acknowledgement that your experience may be wrong, and raises the question of whether you are an ideologue).

The structure is above average in quality and you make several specific claims, but in my estimation the post as a whole would only be convincing to someone who already shares your basic presuppositions.

Updated by anonymous

Chaser said:
Not going to disclose how, but we have methods to prevent this.

I'm not sure what they are, but blocking certain scripts or imitating normal user responses by using programs that can control mouse and keyboard input could defeat them. Random delays or schedules can be used to limit how fast and the amount of time per day that the bot runs. Although I suspect that some e621 users have spent entire days tagging without eating, drinking, or making trips to the bathroom.

Updated by anonymous

Lance_Armstrong said:
I'm not sure what they are, but blocking certain scripts or imitating normal user responses by using programs that can control mouse and keyboard input could defeat them. Random delays or schedules can be used to limit how fast and the amount of time per day that the bot runs. Although I suspect that some e621 users have spent entire days tagging without eating, drinking, or making trips to the bathroom.

Such users have already learned that there is in fact a limit on how often they can make edits.

Updated by anonymous

Abuse prevention should not depend on the source for such bots remaining closed (and I doubt it currently does), so I'm going to ignore that aspect.

There's a practical argument here. The whole point of bots, ostensibly, is to help the community. Not allowing any contributions or derivative works goes against that. If I have to write a bot from the ground up to do something that TagBot doesn't do, rather than just contributing to TagBot, I'm more likely to get something wrong (potentially harmfully). Not to mention, because of the extra work needed, it's less likely to be completed even if it's something that would be good for the site.

So yes, I think it would be better if it were open source. Not for theoretical or philosophical reasons, but to save us all some work.

Updated by anonymous

Maxpizzle said:
Abuse prevention should not depend on the source for such bots remaining closed (and I doubt it currently does), so I'm going to ignore that aspect.

There's a practical argument here. The whole point of bots, ostensibly, is to help the community. Not allowing any contributions or derivative works goes against that. If I have to write a bot from the ground up to do something that TagBot doesn't do, rather than just contributing to TagBot, I'm more likely to get something wrong (potentially harmfully). Not to mention, because of the extra work needed, it's less likely to be completed even if it's something that would be good for the site.

So yes, I think it would be better if it were open source. Not for theoretical or philosophical reasons, but to save us all some work.

You could always contact asw_xxx and ask if he's willing to share. Maybe he has a github he could add you to if you'd like to contribute?
We haven't told him to keep the bot under lock for anybody, but we leave it entirely in his hands what he does with the bot. As long as he sticks to the limitations we set him regarding the functionality he's free to do as he pleases.

Updated by anonymous

Chaser said:
Even some that scrape content(both tags and the images, stealing the work of our users) who then upload to another booru(you know who you are).

sweet mercy at the length of some of the posts in this thread! O_O

that aside, i actually came across one of these bots while sourcing some pics (when we began the tumblr "raw" sourcing project) and noticed a bot had copied some of my and a whole lot of other peoples uploads from here to sites such as rule34.paheal and i think rule34.xxx too. probably other sites as well. that one had been copying tens of thousands of posts to other sites from here and is probably still doing that unless someone stopped it at some point.

i suppose there's no harm in it so long as that's all it was doing (well, that and copying the tags for each post over as well) since that would provide more places to see the content stored here. basically sharing what we've got collected here with other places and people who may not visit e621.

Updated by anonymous

Furrin_Gok said:
Such users have already learned that there is in fact a limit on how often they can make edits.

IIRC, Priv+ don't have hourly tag edit limits.

Updated by anonymous

  • 1