Topic: e621 content rip - well maintained.

Posted under General

Hello everyone. This is probably a bit overkill for my first post but whatever.
Before I get to explaining this and such, I would like to apologize to Varka and the site admins, as even though I tried to grab from the site over a long time (took almost 3 weeks), I did hammer it pretty hard on the occasions I did. So, apologies, and I hope all's forgiven.

Moving on.

TLDR: See bottom of post for access information.

  • To find out more about why I did this, read on.

Is this a site rip? If not, what makes it different?

  • No, this is a Content rip, not a Site rip. A site rip includes the HTML, the scripts, and is downloaded as an exact copy (more or less) of a site, along with original directory structures and the like. This means (in the case of e621), a site rip will be filled with images with illegible names like 473f006517b136f38d272d5481ae9ea5.png (#117848), which whilst great from a server perspective (no duplicate filenames), it's horrible if you just want to easily find a specific image.
  • Therefore, this contains ONLY the images and Flash animations the site contains. Additionally, all the images are arranged (and named) according to their post number. So, for example, if you were looking at 065478.png from the rip, and were interested on viewing comments, rating, posting your own comments, etc, then you would simply navigate to http://e621.net/post/show/065478/ to do so, instead of being left in the dark as to exactly what post number you were looking at (as in the case of a site rip).

Okay, so it's different. But site rips are okay too, right? Why did you bother?

  • I have nothing against site rips. However, the issue I have encountered is the method with which these have been distributed. firstly: All the site rips I have seen have been gigantic RAR archives. This would be fine if everyone was willing to wait days/weeks for it to download (lets ignore the possibility of corrupt/fake downloads for the moment), but for everyone else, this is a problem. They have also been distributed by Bittorrent. Bittorrent is a great protocol with many uses, but uploading 30+GB on a DSL/consumer cable connection to even 5-10 users would take weeks, perhaps months. This is the first problem.
  • At the other end of the pole, this leads to maintenance problems. If you have one gigantic RAR archive distributed via bittorrent, and you want to add more images to it to make it more complete, you'd be forced to add them to the archive. But, uh-oh! Even adding a few files to the archive has made it impossible to download only the new segments; the torrent client would validate up to the first changed file in the archive and download all the rest since the piece hashes would no longer match. This is the second problem that I have hoped to address by doing this.

Alright, so we know why YOU did it. Why should WE care? We can just visit the site to look at the artwork, no?

  • You definitely can, AND SHOULD. This site gets money from advert impressions, so if you stop visiting it it may (eventually) be forced to stop running. Keep the site running by visiting it often, and take a look at the adverts - since the changeover, they have all been really relevant.
  • This is being implemented as an archive of the site for those people that would rather have their own copies of the files. If the site ever goes down for any reason - whether temporary (maintenance) or permanent/semi-permanent (remember the big CP kerfuffle and the eventual switchover to being run by Varka?) - then access to the images will effectively be lost. This rip can act as a backup in those cases, if you really can't do without your furry fix. Additionally, an unexpected side effect of this (that was really unintentional, I swear) - is that many deleted posts have remained in the content rip. Since I will be updating it on a regular basis, any images deleted from the main site for any reason will remain on the content rip; assuming the rip was able to collect it before deletion occurred. A great example is all the Aja Williams work that has recently been deleted due to the updated DNP (#000262, etc).

Okay, just one last thing - How do we access it?

  • I have set the rip up on a FTP server. You can access it via a web browser, but this is HIGHLY discouraged. Internet Explorer and Firefox are known to crash when entering large FTP directories - I have arranged the site so that content is split into directories relative to their post numbers - e621-000-010k contains all the images from posts 13 to 10000, as an example. At best, IE and Firefox will just be immensely slow navigating the FTP. At worst, they may crash, and I cannot be held responsible for any electric blue penises that become lodged anywhere because of it.
  • Address: 46.4.74.41 Port 21, 900 or 4848 (for sneaky isp's who block 21).
  • Username: Public
  • Password: nomcookies (lowercase).
  • I recommend Filezilla as an FTP client if you're on Windows; for linux/mac, I'm sure you already know what your favourite FTP client is.

I hope you all find a use for this. I also hope by doing this I am not breaking any e621 rules; if this is the case, then I shall desist immediately. Thanks.

Updated by Mantikor

I'm not sure. I've seen two, plus another that seemed to be too old to consider a rip - only contained about 10,000 images and no flashes.

Updated by anonymous

As Aurali said, you need to be able to throttle yourself, Especially if you want to do live monitoring of some kind.

A simple delay between requests would probably be sufficient as long as the delay is long enough.

Updated by anonymous

lawl, I started a trend. The problem with the file names is that I was going for what it would be if one were to download the files from this site, that way one could keep their own folder of furry related junk up to date with out furthering their dupes.
I feel as if your choice of FTP is a little harsh though. I don't know what sort of big bucks you pay for inner'net... it won't be pretty for you(Omoronov).
But what ev's. As classically stated, "Haters gon' hate!"

Updated by anonymous

Can you create a torrent for the FTP?? I would try to download but I don't want to screw up the server...

Updated by anonymous

Doesn't seem to be working now, something wrong on your end Omoronovo?

Updated by anonymous

is there an updated site rip if so could someone give me a link where to find it

Updated by anonymous

barricade2521 said:
is there an updated site rip if so could someone give me a link where to find it

You just bumped a 2 year old thread....-.-

Updated by anonymous

Conker said:
You just bumped a 2 year old thread....someone lock this

Never seen a necroed thread get locked lol

Updated by anonymous

  • 1