Topic: E621 Site rip

Posted under General

Some one ripped E621 in 6 parts :D

"This gallery has been resampled for browsing. Download the original through the Doggie Bag Archiver.

per request, a complete site rip as of 10-22-09 of www.e621.net, a compendium of Furry / Anthro porn, with derivatives therein: Futa, gay, shota, etc.. There are approximately 10,000 pictures total, so I've given a quick once over to attempt to remove any incomplete / nonsense pictures, and will steadily upload them as time permits.

This is the second archive, 1960 pictures."

http://g.e-hentai.org/g/169302/a9ebd414da/

Some one realy had full hands of work.

Updated by treos

I am curious as to how he ended up with only 10,000 pictures when we have nearly 50,000 on the site. That's probably a depressing amount of time combing them to do that.

Updated by anonymous

mellis said:
I am curious as to how he ended up with only 10,000 pictures when we have nearly 50,000 on the site. That's probably a depressing amount of time combing them to do that.

Well he did say "so I've given a quick once over to attempt to remove any incomplete / nonsense pictures" so that is why i said "Some one realy had full hands of work.".

Maybe that person only searched for images containing nudity and sex. And quality art.

Updated by anonymous

He checked about 40k posts and filtered out 10k? :D

Updated by anonymous

Jazz said:
He checked about 40k posts and filtered out 10k? :D

Looks like it XD

Updated by anonymous

Kald

Former Staff

I don't think a e621 siterip would be so hard to build. You just need a program that will record pages url contained in the main tags lists.

Say for example, you can generate the following list :
http://e621.net/post/index?tags=female
http://e621.net/post?page=2&tags=female
http://e621.net/post?page=3&tags=female
etc

In each page he will find a list of 20 url :
http://e621.net/post/show/xxxx/
http://e621.net/post/show/yyyy/

By calling each of these url, he gets a source code containing the picture's url that he can record.

After that, he just needs to mass download.

Such a program would only require basic programming skills and basic syntax analysis.

English isn't my native language so i'm not sure i'm very clear, but you get the idea.

Updated by anonymous

Actually, e621 has an API, it vastly simplifies ripping e621, it's as simple as parsing XML.

Updated by anonymous

Just as a sample,
http://dl.dropbox.com/u/612254/e621apiusage.zip

Written in PHP, works best on linux or mac.

This is just one thing you can do over a load of things.

EDIT: DONT USE THIS FOR PUBLIC OR PRIVATE USE. It will most probably do something bad if overused

EDIT2: Atm, the E621 API XML says there are 86433 files. Counting each file downloads in 1.5 seconds, that will take up some hours if not days.

Updated by anonymous

What use is a siterip of e621?
The only reason I use it instead of, say, yiffstar or furaffinity is the ability to do tag searches. And a siterip has no tag searching ability.
Why bother?

Updated by anonymous

Fum said:
What use is a siterip of e621?

Remember when e621 went down for accusations of child porn? Yeah.. good times...

Updated by anonymous

Valence said:
Remember when e621 went down for accusations of child porn? Yeah.. good times...

Remember the names of the artists you like? Yeah... They have furaffinity sites and stuff.
Pictures I like, I save in my smut folder. But I still visit e621 because I can't type "female messy after_sex" to search my smut folder. Stupid Tag2Find not compatible with 64-bit systems.

Updated by anonymous

1 site rip cummin up. I can have a straight, un-altered rip up and seeding in about 2 hours time. Sadly, this will have to wait till later tonight when I don't actually need to do work.

All in favor say 'Grrrghllrghr!"

Updated by anonymous

Why would anyone want a site rip when the site is ... right here?

Updated by anonymous

null0100 said:
Why would anyone want a site rip when the site is ... right here?

Sure it is right here...for now.
Maybe it will not be in time? Maybe it will get hacked? Maybe it will end up being WikiLeaked? Maybe it will go down again like what happened this summer? Only maybe it won't come back....

That is why people do siterips...as backups.
Either that or they want to build their furry porno collection rapidly...lol

Updated by anonymous

It's a gewd jump start when your harddrive dies. I suppose there is some fun in starting anew (taking the time to build separate folders for gay/straight/f/m), but most of the time when I get really drunk I stop caring about which folder I'm saving in so I expect the end products will always be clusterfucks of stuff.

Anyways, the ripping will commence in 3 hours.

Updated by anonymous

site rip in progress, handling exceptions on the fly.

Newly identified obstacles:
+ forgot to take size into account (images/animations/etc)
+ e621 server's speed/connection
+ my speed/connection

Originally it was planned to down these on an ISP's network known to get a good 30MB/s, but there was a priority shift. Due to this and the fact that e621 isn't the fastest race-horse out there my ETA has been blown up to about 3 days or until further word.

ETA 3 DAYS

Updated by anonymous

Ran into major problems. Will have to fix code tomorrow. Up to 8,030 items so far. Barely 7% done. This would be so much easier if the great admins of E621 torrented this out themselves. Think of it as a distributed back up.

Updated by anonymous

Thanks for the hard work, Xanonx. I won't download this myself, but I know many people will and will appreciate the time, effort and bandwidth that has gone into it. It's always nice to have a backup, in the event of the absolute worst happening.

And I certainty wish I'd had something like this a few years back when I was without internet for about 6 months. :)

null -- think of it as a completionest thing. If someone has the time, and the harddrive space... why not? I don't except that everyone wants full site rips... but for som.. well.. it's easier to delete what you don't like then save what you do. Plus, 99,000 images, all at once.

it's not for me, but I can see why they'd want it.

Updated by anonymous

about how big do you think it would be? i'll help keep the torrent alive if its not too big.

Updated by anonymous

Well, being at 7% it's about 1 GB so far. An estimated size assuming that the size of each object is similar to the ones previously downloaded will end up being 14 GB straight up. If only image files are kept (gif, jpeg, png) it may be possible to shave 1 or 2 GB. If it were possible to shift out everything that wasn't pronz it would probably eliminate another 3 to 4 GB. With compression this could probably get down to maybe 7 GB? It's hard to say until it's done. Also, weeding out the non-pronz is a very subjective and tedious process; no one's going to want to do it, and those who do want to do it may end up throwing things they don't like. It's very hard to keep such a stiff mindset while going through 100,000 images+.

Updated by anonymous

xanonx said:
It's very hard to keep such a stiff mindset while going through 100,000 images+.

Tell me about it! I've been organizing my own furry collection by basing it off of how e621 is set up (and using WLPG, it almost looks like e621 as well). It's so hard to remember all the different tags to put on all the images while at the same time getting rid of duplicates and throwing out images I don't like. I did manage to get my collection down from 18 GB to just 7 GB just by trashing duplicates and unwanted images!

I've been working on this "project" on-and-off for over a year now, and still not finished...ugh. This must be how the moderators feel...lol

Updated by anonymous

Right now there's 99291 images out of a total of 101144 posts, which is 91.8%.
On Sept. 23 2010, the entire site was 34GB, deleted posts included, with around 95300 posts, which gives us an average post size of 374KB.
Multiply that by 99291 non-deleted posts gives us roughly 35.4GB total.

Updated by anonymous

lol, nice. I'm almost finished with a new set of scripts. I switched languages and this new set will hopefully run over 9000 times faster. I was dinking around with some multithreading but I don't think that will work and I don't want to DoS e621. If the mods are listening, please don't be mad!

Updated by anonymous

WolfieWolfie1992 said:
Tell me about it! I've been organizing my own furry collection by basing it off of how e621 is set up (and using WLPG, it almost looks like e621 as well). It's so hard to remember all the different tags to put on all the images while at the same time getting rid of duplicates and throwing out images I don't like. I did manage to get my collection down from 18 GB to just 7 GB just by trashing duplicates and unwanted images!

I've been working on this "project" on-and-off for over a year now, and still not finished...ugh. This must be how the moderators feel...lol

i save very few yiff images anymore, but when i do i just dump them in a folder

Updated by anonymous

Ahh - I wondered what the huge spike in CPU usage roughly 22hrs ago was...

http://i.imgur.com/ALvSC.png

See that big hump on the left? That's all you.

I don't mind you ripping the site, just as long as you stick to it being single-threaded (like, one request at a time - I don't mind if you do one thread for images and one for the api) and give the server a couple seconds to respond before retrying the request.

Also, the size of the current data directory is about 40GB (including thumbnails).

I'm not too sure what the concequences of a full siterip, with metadata, floating around are, and as a result I'm not too sure if torrenting up and releasing a full copy of the site ourselves would be sensible - but even then, we'd need some way of embedding metadata such as tags with each image (which would require us to code something up).

lol, wouldn't it be so much easier if there was a 100% solid, reliable 'my porn collection' site with tag-based search, faves integration with FA, e6 etc, and named collections?

Updated by anonymous

lol, It's not so much that I dislike browsing your site, cause I think it's fan-fucking-tastic. No one else has anything like this, save gelbooru. I just have this inherent urge to have a copy of everything. I mean, you know how it is, you meet up with some buds, have a few beers/kegs, you start bartering pronz folders for awesome folders, stuff like that.

Anyways, I'll keep it to one thread. The code only makes one request at a time. I tried ripping the image tags as well, but I couldn't get the XMP library to work correctly so I'm skipping that for now. I'm currently making two passes: one to obtain the direct url's of each image and another to actually take down the image.

Maybe in the future I'll take a less aggressive approach at getting down all of the tags. Let me know if I'm causing too much of a hassle.

Updated by anonymous

No worries; I don't mind you putting more load on the site as long as it's at a sensible rate.

Remember; the length of your e-penis is the inverse log of the number of gigs of porn / warez you have shared via direct connect. :D

Updated by anonymous

Varka said:
lol, wouldn't it be so much easier if there was a 100% solid, reliable 'my porn collection' site with tag-based search, faves integration with FA, e6 etc, and named collections?

That's what I have in plan for Danborganize <3

Updated by anonymous

Lulz, it was supposed to go up last night, but I'm dumb and left the external drive that it was on in the wrong place. Tonight it will be up, hopefully in a little less than 3 hours. And like I said, the community here is so strong that my shit's already missing a good deal of items.

At the time of the rip I missed a trivial amount of items (anywhere from 5-10). These items were malformed from what was observed as the standard and may not have existed in the first place. When queried, E621 yielded 404's. In the rare likelihood that someone actually notices something missing (there were quite a few seems in my lists of items) please notify me and I'll attempt to fix this before the next partial rip.

Updated by anonymous

As I said before, I can't seed this 24/7 yet so sorry if stuff goes down. I'm hosting this off of a laptop that I drag with me everywhere I go.

Updated by anonymous

dammit my truecrypt partition is only 32 gigs and I only have 640 megabytes left

Updated by anonymous

xanonx, could you give me some details about this torrent? How are the files named? Are they categorized into folders? Any image post-processing used?

Updated by anonymous

metrio said:
dammit my truecrypt partition is only 32 gigs and I only have 640 megabytes left

So why exactly do you need to encrypt this whole thing again?

Updated by anonymous

xanonx said:
As I said before, I can't seed this 24/7 yet so sorry if stuff goes down. I'm hosting this off of a laptop that I drag with me everywhere I go.

Ah, I see :P
Got it, now :)

Updated by anonymous

Lulz, of course as I say this 26 leachers jump on my torrent.

Updated by anonymous

How long it took me to compress? Probably 2 hours or so, idk, I started compressing and drinking at the same time. Then I went out to the bar and then I don't remember.

How long will it take you to decompress? Probably 2 hours or so, idk.

Updated by anonymous

eminor said:
xanonx, could you give me some details about this torrent? How are the files named? Are they categorized into folders? Any image post-processing used?

OOp, it's a straight-up rip of the site, so Images will have the same name as they are named on the server (if you right click and download, same name).

There is zero categorization as adding tags/cats. would've added magnitudes of run time to my code. Maybe in the future I'll look into pulling down the tags.

And I didn't want to do any post processing as it would have biased the collection. It was supposed to be an accurate representation of the data on e621 at the time.

Updated by anonymous

lol my cheap single laptop took 2 hours to compress a roughly 4gb file, i'd hate to see how long it woul;d take to compress that

Updated by anonymous

luvdaporn said:
lol my cheap single laptop took 2 hours to compress a roughly 4gb file, i'd hate to see how long it woul;d take to compress that

Lol, wasn't that bad, compressed it on a raided i7 machine with 8GB RAM. I didn't think about how long it might take others to zip/unzip, everything I own and use has too much RAM, i7's, and raid.

Updated by anonymous

Welp, I seeded to a ratio of 2.00, and that's where I'm calling it quits. I have hideous internet, but it was fun taking advantage of super fast public internet to watch upload speed skyrocket from my 38 kb/s to over 600.

Updated by anonymous

I'll start downloading and seeding when i get back home, i only have 500 or so mb/s upload though.

Updated by anonymous

You are all such wonderful people.

I won't seed it constantly, 'cause that'll bring my ISP down on me for pirating. But I'll seed whenever I'm torrenting anything. Which is pretty often.

Updated by anonymous

luvdaporn said:
seeding this torrent isn't pirating.....

Doesn't matter. Most people look down on torrent as a pirating software regardless.

Updated by Donovan DMC

Percy101 said:
Doesn't matter. Most people look down on torrent as a pirating software regardless.

kinda does matter. if you get caught for pirating, its probably for pirating, and they have evidence of the software you downloaded, not just random torrents.

Updated by anonymous

No, I know that seeding this one isn't pirating. But if I seed for long periods of time, especially at high rates, my ISP goes WHOAH and flips out. And it's evil, evil Comcast, so I have no intention of doing anything that makes them even look at me funny.

Updated by anonymous

Comcast will shut you down no holds barred if you take too much bandwidth. hate them.

Updated by anonymous

123easy said:
Comcast will shut you down no holds barred if you take too much bandwidth. hate them.

Oh, sir. I do. I do. For one, they quoted me a price 30 bucks cheaper a month than what they're charging me, and when I called to deal with it, I was sent through circles in the departments for about an hour every day for a week before I finally surrendered, broken and beaten. Unfortunately they've got a monopoly in my area, unless I want to go satellite, which I don't. They are evil incarnate.

Updated by anonymous

RedOctober said:
Oh, sir. I do. I do. For one, they quoted me a price 30 bucks cheaper a month than what they're charging me, and when I called to deal with it, I was sent through circles in the departments for about an hour every day for a week before I finally surrendered, broken and beaten. Unfortunately they've got a monopoly in my area, unless I want to go satellite, which I don't. They are evil incarnate.

never give up, trust your instincts.

Updated by anonymous

Well bend me over and call me Betty. I had to do a hard shutdown while torrenting, and now it won't connect, I can't get it to verify the data, nothing. Even when I delete everything and uninstall/reinstall my client it won't connect, and consistently thinks the torrent is in the list, so I can't re-add. Blaaaaah, starting over after three days and 33%=poop.

Updated by anonymous

Valence said:
Three days of torrenting is nothing. β)

Lol. I don't usually torrent huge things, and they tend to have at least half a dozen seeders. Also where I am right now, my download bandwidth is essentially unlimited, so I've gotten accustomed to grabbing whole tv shows in a day lol. Yeah, if I were back home, it'd be a different story.

Updated by anonymous

treos said:
:/ even if i was interested, i don't know how to read russian.

How are you so sure it's Russian? It could be almost any language that uses Cyrillic script. It could be Ukrainian, for all you know.

Updated by anonymous

ShylokVakarian said:
How are you so sure it's Russian? It could be almost any language that uses Cyrillic script. It could be Ukrainian, for all you know.

heck if i know, i just know it's unreadable to my eyes.

Updated by anonymous

  • 1