The problem
From my company, I receive my payslip via eMail. The pdf is password protected with a password I have chosen.
I would like to remove the password from the document (because I will store it on self-owned, trusted and encrypted servers anyways) and then push it to my self-hosted instance of Paperless which I use for document management.
My solution
So first, I had to get the document to some place where I can reprocess it. The document is sent via eMail to my company mail account. I created a preprocessing eMail account on my own mailserver, called paperless-preprocess@tech-tales.blog
1. Next, I defined a rule in my company eMail: eMails that match the payslip eMail I expect to receive will be forwarded to paperless-preprocess@tech-tales.blog
(and also moved to the correct eMail subdirectory on the server)
Node-Red
Now for processing those eMails, I decided to choose Node-Red, as it was already running with a set of home automation tasks. I use the node-red-node-email
palette, which can send and receive eMails. Other than that, I only use build-in nodes for the workflow. The flow now works as follows:
- Read eMails from my eMail address.
- Do a filter, based on the subject. Other eMails are pushed to that eMail address and receive a preprocessing too, so I again have to do a case distinction.
- Find the correct attachment. The eMail has multiple images in the body (e.g. company logos), which are formally an attachment too. I did that via a function node, and I prepared a request for Stirling pdf (which will do the password removal) too, with the following code:
// Loop over the attachments for (const attachment of msg.attachments) { // Filter attachments that are not PDFs if (attachment.contentType != "application/pdf") continue; // Let's say my document is named something like // payslip-2023-12-chris.pdf if (!attachment.filename.startsWith("payslip") || !attachment.filename.endsWith(".pdf")) continue; msg.document_buffer = attachment.content; } // I should do some kind of trouble shooting // here (in case the document was not found)... // Prepare the request for Stirling-pdf msg.headers = { "Content-Type": "multipart/form-data", "accept": "*/*" }; // Keep the old payload stored; I may use that // some time later... msg.old_payload = msg.payload; msg.payload = { "fileInput": { value: msg.document_buffer, options: { type: "application/pdf", // Not quite sure why and if a filename // has to be set... filename: "in.pdf" } }, "password": "my_password" }; return msg;
- The next step is a request to Stirling pdf. This is a pdf tool which can do fancy things. The most important thing at the moment is that it can remove passwords from pdfs (obviously). There’s an API documentation for the tool, and I chose this endpoint for my project. The most challenging thing was to find out how I can simulate a form in Node-Red, but I figured it out in the end. The input for the form was already defined in the previous function node.
So the next node is a quite simple Node-Red HTTP Request node. I send a POST request, I defined the URLhttps://stirling-pdf.tech-tales.blog/api/v1/security/remove-password
. I also told the node that theReturn
will be a binary buffer. - Finally, I had to create a new eMail which I will then send to
paperless@tech-tales.blog
. This eMail account is consumed by Paperless, so I will not have to interact with the workflow in any way. I just did some preprocessing the following way:msg.attachments = [{ type: "attachment", content: msg.payload, contentType: "application/pdf", filename: "Gehaltszettel.pdf" }]; msg.topic = "New Gehaltszettel"; // Overwrite the payload to some string, otherwise // Paperless is confused msg.payload = "The non-encrypted Gehaltszettel for Paperless";
- The final node in my flow is just a “Send eMail” Node.
Stirling pdf
As already written, the most challenging thing was to get the file decrypted by Stirling PDF. I did have an instance running, and I also was easily able to remove a password from a document via the UI, but I had troubles parsing form data to a coded POST request. Finally, the following things were important
- The header
Content-Type: multipart/form-data
. But this was not too surprising. - The actual form data has a fancy encoding. Basically, we have an object with typical
key: value
pairs. But when sending a document, things get complicated. In this case, the pair gets extended tokey: {value: ..., options: {...}}
.- Option
type: application/pdf
was quickly found and not too surprising. - I also needed the option
filename: "in.pdf"
. This one was quite surprising for me, and it took me a long time to find that out. The actual filename defined here does not (need to) have anything to do with the actual filename, but it needs to be set. Otherwise, the request fails - and the API returns some error that I was unable to parse.
- Option
Resume
So I was able to unparse the password. Also, I was able to interconnect quite a lot of my self-hosted services: Node-Red, Stirling pdf, my eMail server, and Paperless.
I mostly learned about POST
ing form data to a web API, and I learned that I do not like Javascript too much. I would really like to switch from Node-Red to something else, which can do the same. I know that there is Apache Airflow which actually is written in pure Python (and I like Python!), but I am quite not sure if this can do everything I want - in particular, the home automation bits that interact with MQTT seem to be a bit hard there…
As usual, I am using dummy eMail addresses and domains that do not need to exist. Feel free to send an eMail to this address, but don’t be disappointed if I do not respond. ↩︎