Topic: Optimal method of PDF to PNG conversion?

Posted under Art Talk

Hello, I wanted to upload a comic provided by the artist as a PDF file here:
[https://kit-ray-live.tumblr.com/post/185341868286/cause-i-have-been-thinking-part-one-written-by].
However, I don't have any experience with converting the PDF pages to PNG images.

I tried using GIMP to convert one page using the "200 pixels per inch setting" and it looks pretty good, but I am not sure if it's the optimal solution. I posted that one page here:
[https://e621.net/post/show/1978168]

Then, I realized I should probably check in with someone smarter than me before I went on.

Before I proceed, I would appreciate any advice from someone with more experience than I do. I want to preserve as much detail as I can without artificially inflating the image.

Thanks!

Updated by Mairo

I hate PDF files in this context.
Because the pages aren't images, they are constructed from assets, this means there can also be vector assets and text which scale infinitely and there can be combined image elements as well.

Best method I have seen is to indeed just use GIMP, import with massive DPI (600 or something) and start doing pixel counting from e.g. macroblocking or bilinear upscaling of what could be the original resolution.

However at least with Adobes own reader, you can click on images and simply copy them from opening context menu. With this particular PDF, the pages are indeed images and only the page number underneath is text asset.
https://puu.sh/EcyCQ/b4e86c0949.png
In GIMP then create new from clipboard, save as PNG, run trough Pingo lossless web optimization and upload. If wanting to go extra step, you can render out the pages into GIMP, then paste the copied page on top so now there's also the missing page number.

And if someone does have a way to at least determine the desired DPI of PDF file or better method I would also like to know.

Updated by anonymous

What I usually use for extracting images from PDFs like that is a tool called pdfimages from the poppler library. Only problem is that it's a command line program so you gotta be willing to open that up and type in a few things. On Windows you can get a build from here . You'll want to run it as pdfimages -all <path to pdf> <output prefix>, which will try to extract the images in their native format (so JPEGs stay compressed) or otherwise will output PNGs.

Updated by anonymous

+1 on pdfimages (although obviously, using pdfimages requires you to be somewhat comfortable with the commandline).

My relatively limited experience is that any given image in a pdf will be exported as either JPG or PBM. PBM is already a lossless format so there's no reason not to convert PBMs to PNG (you can use GIMP for that)

Updated by anonymous

Thank you all for the responses! I think I'll just go with Mairo's advice for now, especially since I didn't realize this PDF's pages were already images.

Updated by anonymous

bonghit840 said:
Thank you all for the responses! I think I'll just go with Mairo's advice for now, especially since I didn't realize this PDF's pages were already images.

In this case the images seem to be lossless, so there shouldn't be too much difference in outcome, however if the image files were JPG, then you definitely would need to rely on exporting method of some sort.

Also seems like cover pages are JPG files as seen from macroblocking.

hanzai said:
What I usually use for extracting images from PDFs like that is a tool called pdfimages from the poppler library. Only problem is that it's a command line program so you gotta be willing to open that up and type in a few things. On Windows you can get a build from here . You'll want to run it as pdfimages -all <path to pdf> <output prefix>, which will try to extract the images in their native format (so JPEGs stay compressed) or otherwise will output PNGs.

That's pretty much exactly I have been looking for.
Could've saved me the trouble of free trial of adobe reader DC several weeks ago (spoiler: their exporting sucks).

Updated by anonymous

Mairo said:
I hate PDF files in this context.
Because the pages aren't images, they are constructed from assets, this means there can also be vector assets and text which scale infinitely and there can be combined image elements as well.

Regardless of scaleable assets, PDF files always have a default resolution, don't they? Just export the pages at whichever resolution is set by the 100% zoom option.

Updated by anonymous

OneMoreAnonymous said:
Regardless of scaleable assets, PDF files always have a default resolution, don't they? Just export the pages at whichever resolution is set by the 100% zoom option.

That's the thing, it does not.

It has DPI which determines that how large the page should be printed, but there's no hard set resolution. This is also why software like GIMP requests you to insert either dimensions or target DPI when importing the PDF.

And even if PDF pages did have hard set resolution, the assets in that page can be something completely differend.

Updated by anonymous

Mairo said:
That's the thing, it does not.

It has DPI which determines that how large the page should be printed, but there's no hard set resolution. This is also why software like GIMP requests you to insert either dimensions or target DPI when importing the PDF.

And even if PDF pages did have hard set resolution, the assets in that page can be something completely differend.

Well, now you got me wondering what the 100% zoom means. I picked a random PDF file and played around with the zoom settings and my monitor's resolution, and the pages' resolution seems to be a fixed value.

Updated by anonymous

How did you determine that the pages' resolution was in fact fixed?

A better way of saying what Mairo said might be that it has *dimensions*, but they are measured in physical units, not pixels. (so a '100% zoom' would probably mean 'an inch on screen corresponds to an inch when printed out')

Updated by anonymous

Mairo said:
I hate PDF files in this context.
Because the pages aren't images, they are constructed from assets, this means there can also be vector assets and text which scale infinitely and there can be combined image elements as well.

Best method I have seen is to indeed just use GIMP, import with massive DPI (600 or something) and start doing pixel counting from e.g. macroblocking or bilinear upscaling of what could be the original resolution.

However at least with Adobes own reader, you can click on images and simply copy them from opening context menu. With this particular PDF, the pages are indeed images and only the page number underneath is text asset.
https://puu.sh/EcyCQ/b4e86c0949.png
In GIMP then create new from clipboard, save as PNG, run trough Pingo lossless web optimization and upload. If wanting to go extra step, you can render out the pages into GIMP, then paste the copied page on top so now there's also the missing page number.

And if someone does have a way to at least determine the desired DPI of PDF file or better method I would also like to know.

I typically select and copy the file and paste it into MS Paint.

There is an easier option with a catch. Upon further inspection of Adobe Reader I'm seeing an Export option that allows you to convert to PNG and JPG...if you pay for Pro. Yes, Adobe wants to squeeze money out of us any way they can.

Updated by anonymous

PheagleAdler said:
I typically select and copy the file and paste it into MS Paint.

There is an easier option with a catch. Upon further inspection of Adobe Reader I'm seeing an Export option that allows you to convert to PNG and JPG...if you pay for Pro. Yes, Adobe wants to squeeze money out of us any way they can.

I mean, MS Paint does work in this context, but you do just have to make sure that you don't do anything dum after pasting the content and make sure to save as PNG instead of default JPG.

Also it's almost like I mentioned something in my earlier message...

Mairo said:
Could've saved me the trouble of free trial of adobe reader DC several weeks ago (spoiler: their exporting sucks).

So how Adobe readers export option does with image files is that they have several image formats and quality slider. They essentially just print the page, but instead of printing saves it as image file and maximum quality was 1024px dimension. So it does pretty much identical job to GIMP, but instead of allowing giving exact resolution what you want, you have ambiguous slider for that.

Also there's couple things to understand here. Adobe as company does try to make profit and on top of this, they do generally aim at selling their products to professionals, the ones who do usually require those tools to do stuff, not someone who is trying to make single PDF into image so they can upload it into furry website. Of course if those individuals buy the product it's always just money for them, but that's rare.

Another aspect is PDF format itself. The main job of the format is to be able to send documents in a way that it will be exactly the same regardless of who opens it and with what software, as other formats will have problems with this if recipient e.g. is missing specific font so their Word uses default font instead. As such, the format was never meant to be edited, that's a feature that companies and individuals creating PDF documents might need. You as end user are meant to only see the content and possibly print it and after 2008, anyone can create PDF software themselves without the need to pay royalties to adobe.

So I wouldn't exactly call it adobe squeezing money, they are offering a service which is not essential part of the package and you have the choise to not use that and rely on other company or even free solutions instead.

Updated by anonymous

  • 1