Comment

Now that this is known, It’s not enough to remove metadata from the PDF itself. Each image inside a PDF, for example, can contain metadata.

There are multiple ways of removing ALL metadata from a PDF, here are most I know of.

It will be slow-ish and probably make the file larger, but if you’re sharing a PDF that only you are supposed to have access to, it’s worth it. MAT or exiftool should work.

source

Sort:hotnew top

Passerby6497@lemmy.world ⁨1⁩ ⁨year⁩ ago
Wouldn’t printing the PDF to a new PDF inherently strip the metadata put there by the publisher?

source
- sandbox@lemmy.world ⁨1⁩ ⁨year⁩ ago
  it’s possible using steganographic techniques to embed digital watermarks which would not be stripped by simply printing to pdf.
  
  source
  - FinalRemix@lemmy.world ⁨1⁩ ⁨year⁩ ago
    Got it. Print to a low quality JPG, the use AI upscaling to restore the text and graphs.
    
    source
    Syn_Attck@lemmy.today ⁨1⁩ ⁨year⁩ ago
    You should spread that idea around more, it’s pretty ingenious. I’d add first converting to B&W if possible.
    
    source
  - Syn_Attck@lemmy.today ⁨1⁩ ⁨year⁩ ago
    This is a great point. Image watermarking steganography is nearly impossible to defeat unless you can obtain multiple copies of the ‘same’ file from multiple users to look for differences. It could be a change of a single 10-15 pixels from one rgb code off.
    
    rgb(255, 251, 0)
    
    to
    
    rgb(255, 252, 0)
    
    Which would be imperceptable to the human eye. Depending on the number of users it may need to change more or less pixels.
    
    There is a ton of work in this field and its very interesting, for anyone considering majoring in computer science / information security.
    
    source
    sus@programming.dev ⁨1⁩ ⁨year⁩ ago
    I wonder if it’s common for those steganography techniques to have some mechanism for defeating the fairly simple strategy of getting 2 copies of the file from different sources, and looking at the differences between them to expose all the watermarks.
    
    (I’d think you would need sections of watermark that overlap for any 2 or n copies of the data, which may be pretty easy in many cases, though the difference makes detecting the general watermarking strategy massively easier for the un-watermarkers)
    
    source
  - Thann@lemmy.ml ⁨1⁩ ⁨year⁩ ago
    When is why you steghide random data to the image to fuck up the other end =]
    
    source
    Syn_Attck@lemmy.today ⁨1⁩ ⁨year⁩ ago
    Unless you know specifically what they’re adding or changing this wouldn’t work. If they have a hidden ‘barcode’ and you add another hidden ‘barcode’ or modify the image in a way to remove some or all of theirs, they’d still be able to read theirs.
    
    source
    -> View More Comments
- Syn_Attck@lemmy.today ⁨1⁩ ⁨year⁩ ago
  Good question. I believe “Print to PDF” isn’t actually “printing” it page by page as if it was a physical printer, but rather just saving the loaded PDF to a PDF file locally.
  
  I’m not an expert in this field, but you can ask on StackExchange, or ask the author of MAT and exiftools, or do it yourself by making a PDF with a jpg file with your metadata, and then extract the image and let us know here - it would be useful information that I can’t find via search engines. I’m using a smartphone so I can’t do it, but if you do, note from the linked SE page is you won’t be able to extract the original file extension, so if you use your own .jpg with your own exif data, rename to .jpg when finished (I believe exif is handled differently based on file type).
  
  There are multiple tools to add exif data to an image but the exiftool website has some good easy examples for our purpose.
  
  exiftool -artist=“Phil Harvey” -copyright=“2011 Phil Harvey” YourFile.jpg
  
  (do this as the first step before adding to the PDF)
  
  source
Zacryon@lemmy.wtf ⁨1⁩ ⁨year⁩ ago
Okay, got it. Print the PDF, then scan it and save as PDF.

Or get some monks to get a handwritten copy, like the good old times.

source