this post was submitted on 06 Nov 2024
26 points (93.3% liked)

No Stupid Questions

2307 readers
114 users here now

There is no such thing as a Stupid Question!

Don't be embarrassed of your curiosity; everyone has questions that they may feel uncomfortable asking certain people, so this place gives you a nice area not to be judged about asking it. Everyone here is willing to help.


Reminder that the rules for lemmy.ca still apply!


Thanks for reading all of this, even if you didn't read all of this, and your eye started somewhere else, have a watermelon slice ๐Ÿ‰.


founded 2 years ago
MODERATORS
 

Want to ensure financial documents cant be parsed by automated systems

you are viewing a single comment's thread
view the rest of the comments
[โ€“] cannedtuna@lemmy.world -2 points 1 day ago (6 children)

OCR cannot scan documents that have been certified or digitally signed.

Note that once you certify a document it can no longer be edited, combined with another PDF, or have pages inserted or extracted.

Once a PDF has been digitally signed it is locked and you can no longer add pages, delete pages, or read it via OCR.

[โ€“] MystikIncarnate@lemmy.ca 4 points 16 hours ago

This works, right up until you introduce PDF compatible software that doesn't give a shit about your rules, of which there's plenty.

You can also print/scan, or even print to PDF to get around such limitations. The original document cannot be altered since that would invalidate the digital signature on the file, but you can create a perfect digital copy, omitting the signature, and modify it however you want.

If online systems that are skimming documents for their contents don't give a shit about what the signature is, and simply take a copy and OCR it to train an AI or amalgamate the information for data harvesting or other purposes.

I get what you're saying and in concept, it should be fine, the problem is that it's a software lock/restriction on a file type that isn't inherently closed source, unknown, nor was the PDF format built to be secure from the ground up. So we're applying security to a system that wasn't built for it.

load more comments (5 replies)