The problem

From my company, I receive my payslip via eMail. The pdf is password protected with a password I have chosen.

I would like to remove the password from the document (because I will store it on self-owned, trusted and encrypted servers anyways) and then push it to my self-hosted instance of Paperless which I use for document management.

My solution

So first, I had to get the document to some place where I can reprocess it. The document is sent via eMail to my company mail account. I created a preprocessing eMail account on my own mailserver, called paperless-preprocess@tech-tales.blog1. Next, I defined a rule in my company eMail: eMails that match the payslip eMail I expect to receive will be forwarded to paperless-preprocess@tech-tales.blog (and also moved to the correct eMail subdirectory on the server)

Node-Red

Now for processing those eMails, I decided to choose Node-Red, as it was already running with a set of home automation tasks. I use the node-red-node-email palette, which can send and receive eMails. Other than that, I only use build-in nodes for the workflow. The flow now works as follows:

  • Read eMails from my eMail address.
  • Do a filter, based on the subject. Other eMails are pushed to that eMail address and receive a preprocessing too, so I again have to do a case distinction.
  • Find the correct attachment. The eMail has multiple images in the body (e.g. company logos), which are formally an attachment too. I did that via a function node, and I prepared a request for Stirling pdf (which will do the password removal) too, with the following code:
    // Loop over the attachments
    for (const attachment of msg.attachments) {
      // Filter attachments that are not PDFs
      if (attachment.contentType != "application/pdf")
          continue;
    
      // Let's say my document is named something like
      // payslip-2023-12-chris.pdf
      if (!attachment.filename.startsWith("payslip") ||
          !attachment.filename.endsWith(".pdf"))
          continue;
    
      msg.document_buffer = attachment.content;
    }
    
    // I should do some kind of trouble shooting
    // here (in case the document was not found)...
    
    // Prepare the request for Stirling-pdf
    msg.headers = {
      "Content-Type": "multipart/form-data",
      "accept": "*/*"
    };
    // Keep the old payload stored; I may use that
    // some time later...
    msg.old_payload = msg.payload;
    msg.payload = {
      "fileInput": {
          value: msg.document_buffer,
          options: {
              type: "application/pdf",
              // Not quite sure why and if a filename
              // has to be set...
              filename: "in.pdf"
          }
      },
      "password": "my_password"
    };
    
    return msg;
    
  • The next step is a request to Stirling pdf. This is a pdf tool which can do fancy things. The most important thing at the moment is that it can remove passwords from pdfs (obviously). There’s an API documentation for the tool, and I chose this endpoint for my project. The most challenging thing was to find out how I can simulate a form in Node-Red, but I figured it out in the end. The input for the form was already defined in the previous function node.
    So the next node is a quite simple Node-Red HTTP Request node. I send a POST request, I defined the URL https://stirling-pdf.tech-tales.blog/api/v1/security/remove-password. I also told the node that the Return will be a binary buffer.
  • Finally, I had to create a new eMail which I will then send to paperless@tech-tales.blog. This eMail account is consumed by Paperless, so I will not have to interact with the workflow in any way. I just did some preprocessing the following way:
    msg.attachments = [{
      type: "attachment",
      content: msg.payload,
      contentType: "application/pdf",
      filename: "Gehaltszettel.pdf"
    }];
    
    msg.topic = "New Gehaltszettel";
    // Overwrite the payload to some string, otherwise
    // Paperless is confused
    msg.payload = "The non-encrypted Gehaltszettel for Paperless";
    
  • The final node in my flow is just a “Send eMail” Node.

Stirling pdf

As already written, the most challenging thing was to get the file decrypted by Stirling PDF. I did have an instance running, and I also was easily able to remove a password from a document via the UI, but I had troubles parsing form data to a coded POST request. Finally, the following things were important

  • The header Content-Type: multipart/form-data. But this was not too surprising.
  • The actual form data has a fancy encoding. Basically, we have an object with typical key: value pairs. But when sending a document, things get complicated. In this case, the pair gets extended to key: {value: ..., options: {...}}.
    • Option type: application/pdf was quickly found and not too surprising.
    • I also needed the option filename: "in.pdf". This one was quite surprising for me, and it took me a long time to find that out. The actual filename defined here does not (need to) have anything to do with the actual filename, but it needs to be set. Otherwise, the request fails - and the API returns some error that I was unable to parse.

Resume

So I was able to unparse the password. Also, I was able to interconnect quite a lot of my self-hosted services: Node-Red, Stirling pdf, my eMail server, and Paperless.

I mostly learned about POSTing form data to a web API, and I learned that I do not like Javascript too much. I would really like to switch from Node-Red to something else, which can do the same. I know that there is Apache Airflow which actually is written in pure Python (and I like Python!), but I am quite not sure if this can do everything I want - in particular, the home automation bits that interact with MQTT seem to be a bit hard there…


  1. As usual, I am using dummy eMail addresses and domains that do not need to exist. Feel free to send an eMail to this address, but don’t be disappointed if I do not respond. ↩︎