Topic: Tagging Year & System automation.

Posted under Tag/Wiki Projects and Questions

I've become quite the happy tagger the last few weeks. I've noticed some posts come with their respected years tag i.e '2016','2001' etc.
The year tag is unimportant to me as I never use it, though having tag disparity is annoying as I'm sure you all know.

If it can be agreed that year tags are even necessary then a universal tag should be applied by the system automatically. It's not a job for taggers.

That is what this thread is about.

Updated by savageorange

Automatic year tagging should be doable for most sources, I think. FA and IB display pretty clearly the date of posting. It could also be done after posting by any user like how Hudson tags pic resolution.

I like to sort an artist's posts by date to see their art progression.

Updated by anonymous

theres no way it would work. like the date an art piece is posted to a gallery does not always reflect the date the artwork is created.

Updated by anonymous

That's true, but >75% of the time it's correct. If a program tags dates only when the source is from the original artist, that figure goes up to like 90%, because very few artists repost work within their own galleries. Plus, that's how most users tag dates, so it's not any worse than the current system, just more complete.

Updated by anonymous

That's right Leomole, any posts that are tagged incorrectly can be caught by human taggers in time.

Updated by anonymous

Well... I don't know about that. If date tagging is automated for posts from FA, for example, we're talking about the automatic tagging of roughly 300,000 posts. Errors will be inevitable, but I think they'll be relatively infrequent and can be easily corrected if a user notices one.

I tested a sample of posts: every 10,000th post from post #800000 to post #990000 (n=20). Of these, 1 is deleted, 5 have no source, and 2 have nonstandard sources (booru and content aggregator). But there's also 1 from pixiv, 1 from DA, 1 from IB, and 9 from FA. Of these, all appear to have the correct date at the source, so automated tagging would be correct in 100% of these cases. 7 of these 12 posts are missing the date tag, so I think there's a significant benefit to having them automatically tagged.

Updated by anonymous

Damn son, I wouldn't call that a statistical certainty but I applaud you for the quick and clever sample. What's the logistics of coding the site to pull info off another site though? I can't imagine it would be easy. My original idea was to use date of posting as a starting point, to clarify.

Saying that I'd still like another opinion, I mean it can't be a bad but a poll or something for ideas would be good.

http://www.strawpoll.me/11195546

Updated by anonymous

Ratte

Former Staff

There would likely also be problems with posts from many years ago, like the 90's and early 00's, where it's very likely the work was posted much later than it was created. I don't really see a reason to automate something like this as a date is just four characters and I'm sure there are people could do this better manually.

Updated by anonymous

As long as someone is willing to actually implement that system, I guess if the error margin is small enough and script won't break from small changes sites do, it would be more beneficial to have any year tagged instead of none.

Though I do wonder how many will actually check the year tags if they are correct, it's much easier to spot post without any year tag.

However it would have to be the year that is on the image, images metadata or place where it initially posted. Date when it's uploaded here is almost pointless as it's already searchable.

Updated by anonymous

Ratte said:
a date is just four characters and I'm sure there are people could do this better manually.

I agree that it would be ineffective for older posts. That could be avoided by making the system only tag newly submitted posts?
The last bit made me kek though. Saying it's just four characters when some people can hardly manage four tags makes the point redundant.

Mario69 said:
Though I do wonder how many will actually check the year tags if they are correct, it's much easier to spot post without any year tag.

However it would have to be the year that is on the image, images metadata or place where it initially posted. Date when it's uploaded here is almost pointless as it's already searchable.

There is an element of 'tagger blindness' lets call it... To remove that concern we'd need to wager the accuracy is good enough that taggers wont need to check. We're in agreement that the accuracy isn't a problem assuming the system works as intended.
Upload date does render my original suggestion pointless lol.

Ratte has objected and the straw poll is leaning no. Another Admin/Mod weigh in?
I'm leaning on yes as Mario points out the script should be small and sturdy.

Updated by anonymous

Ratte

Former Staff

GtheOtter said:
I agree that it would be ineffective for older posts. That could be avoided by making the system only tag newly submitted posts?
The last bit made me kek though. Saying it's just four characters when some people can hardly manage four tags makes the point redundant.

There is never a guarantee that people will only submit newly-made content.

RE: tags-- I deal with people who do this every day. I'm aware. :V

Updated by anonymous

Web scraping is actually fairly easy (depending on the language). But we definitely can't -rely- on such a system, as it would need to do the things that a human does better(eg. on FA, upload is dated XXXX-YY-ZZ, but the artist says 'I made this on AAAA-BB-CC but forgot to upload it until now'; or worse, they don't say anything at all but they uploaded it 3 months ago on their tumblr. Some judgement and possibly communication is involved.

The fact that people do not only upload recently made stuff is pretty apparent with just a little scrutiny.. and uploading of better quality versions means that there is a positive reason that people -will- do this. Curation (people picking out things 'suitable' and 'interesting' to upload) really builds a delay into the system anyway.

In addition, my impression is that tagging convention is to scrupulously avoid creating false positives (and I've looked into auto-tagging before -- it's actually amazingly difficult to find any 'safe' cases)

So definitely -1 from me.

Updated by anonymous

  • 1