Topic: db_export (or other bulk download) for favorites?

Posted under Site Bug Reports & Feature Requests

sorry if this has been discussed before--wasn't able to find any talk about this by googling

from my digging, it doesn't look like there's any comprehensive database export for users' favorites at the moment. unless I'm just blind (not ruling that out) it seems like all of the exports in db_export don't mention favorites by individual users, and I'd really like to avoid scraping the website.

i've seen people in the past talk about using favorites data to create recommendation engines, however i'm not aware of any of them that are actually still available/online at the moment. i'd like to give it a shot, or at least have the data out there for someone else to try and write a recommender. is there any publicly available way to get this data without scraping, and if not, how hard would it be to provide that?

i'd be down to accept the data in whatever format/frequency is practical to provide if it's not an excessive load on the database.

bitwolfy said:
Have you considered using the API?
https://e621.net/help/api

The endpoint you are looking for is here: https://e621.net/favorites.json?user_id=122036

yeah sorry I misused the word scrape here, made that a lot more unclear than I needed to. I'm aware of the API (i've written a few babby scripts a while ago for fun) and it's what I'd actually use assuming that there's no other provided way to get the favorites data I'm looking for.

the real issue here is the scale/multiplicity of the data i'm looking at--to generate favorites-based recommendations for a single post, ideally you look at everyone who favorited a certain post (available on the site when logged in, but IDK the endpoint for this??), next you compare all of the favorites of everyone who favorited the target post (ideally this is cached as much as possible), and with all that math done you can come up with a list of similarity scores for posts and use that to generate a list of recommendations.

so if I wanted to strictly use the API for this, i'd be iterating through users, iterating through all pages of their favorites, and caching them in a local database as I go. I haven't done the math on how long that'd take but, given 1 request per second it'd probably be... a while; and keeping the data up to date would require repetition of this process.

Oh, I see.

Yeah, the API is pretty much the only method available for this sort of thing.
There also isn't an API endpoint for users who favorited a specific post. Earlopain made that page, so you might try to persuade him to make one, though.
The approach of getting the users' favorites data is one of the main hurdles of this sort of project. Remember that some users have hit the limit of 80,000 favorites – and some have way more than that for one reason or another.

Personally, I have not dabbled too much in making a comprehensive recomendation engine. The one I have in my RE621 script is very simplistic.
You may want to speak with binaryfloof and Archid – they are both working on something like this right now.

Favorites are not exported because they are not considered "public" anymore. As of the change to allow users to hide their favorites, it is no longer easy to export them in bulk.

bitwolfy said:
Oh, I see.

Yeah, the API is pretty much the only method available for this sort of thing.
There also isn't an API endpoint for users who favorited a specific post. Earlopain made that page, so you might try to persuade him to make one, though.
The approach of getting the users' favorites data is one of the main hurdles of this sort of project. Remember that some users have hit the limit of 80,000 favorites – and some have way more than that for one reason or another.

Personally, I have not dabbled too much in making a comprehensive recomendation engine. The one I have in my RE621 script is very simplistic.
You may want to speak with binaryfloof and Archid – they are both working on something like this right now.

thanks for the tips, much appreciated. from scraping the thread(s), it looks like the recommendation engine that those two are working on is a little more tag-focused, which seems a lot more achievable with the API as-is.
if I were in better health i'd look into contributing to the site itself to try and export this data, but for now I'll probably try and work on alternative approaches to generate "good enough" recommendations.

for now, take this as a request to add public favorites to the db_export datasets, but I understand if that's not going to happen.

kiranoot said:
Favorites are not exported because they are not considered "public" anymore. As of the change to allow users to hide their favorites, it is no longer easy to export them in bulk.

ah, that'd do it. is there any way for me to contribute to making this 'easier' to achieve, or is it just a product of the current database schema?

Product of the schema and the excessive amount of time it takes to do the join and comparison against the users settings at the scale involved to export several hundred million rows.

kiranoot said:
Product of the schema and the excessive amount of time it takes to do the join and comparison against the users settings at the scale involved to export several hundred million rows.

damn. i take it that cheering on the database with my ever-present charm won't be enough to speed it up? :(
i appreciate you taking the time to answer these questions frankly; looks like it's back to the drawing board for now.

please let me know if ya'll find a way to implement this; if not i'll try to poke around the publicly-available code just for my own satisfaction and see what I can do.

Even if it was not limited by schema, it begs the question of if it's exportable data in the sense of user expectations. Snapshots don't reflect changes in user privacy settings, which, is undesirable from a user standpoint. Mass collection of this information and infinite preservation is also going to be seen by users as questionable. This is why the endpoints are mostly locked to logged in users, and generally not seen as "public" endpoints.

kiranoot said:
Even if it was not limited by schema, it begs the question of if it's exportable data in the sense of user expectations. Snapshots don't reflect changes in user privacy settings, which, is undesirable from a user standpoint. Mass collection of this information and infinite preservation is also going to be seen by users as questionable. This is why the endpoints are mostly locked to logged in users, and generally not seen as "public" endpoints.

before today I wasn't even aware that favorites _could_ be privated, goes to show much i pay attention...

I think you make a fair point about users changing preferences. It's not my data, so it's not my place to stage a debate IMO. I had the impression that users still viewed favorites as implicitly public info, which I suppose isn't really accurate anymore.

Either way. It's time for me to drop this avenue and reconsider how to approach this concept in a way that respects the way that privacy, and the expectation of privacy, have changed on this website. not really sure _where_ to take the work from here, but fortunately i haven't written any code yet. perhaps there's a way to make a site that collects data in a more actively consensual way (show users photos, let them vote, collect data), rather than just with the implicit consent of the data being left public.

I guess I just spent too long thinking about the logistics of the project and not enough about, well, the everything else.

E: anyway, thanks for entertaining my thoughts; have a good evening ya'll

  • 1