How I Archive Toots

A cartoon image, from the shoulders up, of a curly brown haired white woman wearing glasses, a purple top, and gray dungarees. Behind are large stripes in black, gray, white and purple, with faint stars.

Jessica

Date: 2023-04-14

Word Count: 1710
Reading Time: 9 Minutes

Category: Basement

Tags: Setup Social Media Website

The Mastodon server I use recently experienced some technical issues, which we could all still talk to each other on the local feed (and our toots seem to be reaching people outside of our instance), we could not see toots incoming, and our notifications were also broken.

But, as we could still access our data, many of us backed ours up using the Export function from the preferences section on a web browser. However, once we have our data, what can be done with it? In this post, I’ll go through the process of how I archive my toots using my Mac.

Toots?

Before I begin, I should explain for those unaware, that Mastodon is a federated alternative to Twitter, and toots are the name given to the alternative of Tweets - however, many prefer posts or notes instead. I like using toots, but do alternate between that and posts. For clarity here, I’ll be using “toots” for the microblogging messages on Mastodon, and “posts” for Hugo blog posts that come from .md files.

Micro Blog

Many wanted a back up in case we needed to migrate to a different instance, but I have long wanted to create an archive of my toots in case the instance went offline, or the admin decided to turns on automatic deletion of toots after a certain amount of time in order to save space.

I created a test Micro Blog containing some copied-and-pasted toots a few months ago, and while I was happy with the end result, the actual process of getting to it was slow and cumbersome due to the process of copying and pasting toot-by-toot.

Spurred on by the technical issues of my instance, I decided to have another look at what could be done with the outbox.json file I extracted from the exported archive.

Scripts

A started with a web search and some scripts came up, but I ran into multiple errors. I’m not well versed in many languages, more of just a HTML and CSS person, with dabbles in YAML and TOML to reconfigure Hugo. As such, even though the guides I used seemed straightforward, and probably are to those who regularly use Scala or JavaScript, I encountered errors I didn’t know how to fix.

However, I finally managed to get something working through a slightly modified version of a Node.js script by Chris Deluca:

#!/usr/bin/env node

import { readFile, writeFile } from 'node:fs/promises';
import { Buffer } from 'node:buffer';
import striptags from 'striptags';

(async () => {
    try {
        const filePath = new URL(
            './outbox.json',
            import.meta.url
        );
        const contents = await readFile(filePath, { encoding: 'utf8' });
        const data = JSON.parse(contents);
        data.orderedItems.forEach(async (item) => {
            if (item.object.inReplyTo || !item.object.content) return;
            const unixTimestamp = Math.floor(
                new Date(item.published).getTime()/1000
            );
            const publishDate = new Date(item.published).toISOString();
            const template = `+++
title = "Note on ${item.published}"
slug = "${unixTimestamp}"
publishdate = "${publishDate}"
draft = false
syndicated = [ "${item.object.url || ''}" ]
+++

${striptags(unescape(item.object.content), {allowedTags: new Set([
                'a',
                'strong',
                'em',
            ])})}`;

            const fileData = new Uint8Array(Buffer.from(template));
            await writeFile(
                `/Users/Jessica/posts/${unixTimestamp}.md`,
                fileData
            );
        });
    } catch (err) {
        console.error(err.message);
    }
})();

For some reason, I had an issue with line 10 ./outbox.json, I needed the ./ in front to make any of the pathing work, as without it, it became confused and complained of lacking permissions.

Also, as someone who doesn’t use Node.js often, I had to grab striptags separately before running the script to make it work.

As such, here is the step-by-step of how I did it if you need to start from scratch and don’t know much about all this sort of stuff, like me!

Guide

I’ve written this guide for Mac as that’s what I’m using. The entire process uses Finder, Terminal, The Unarchiver, and CotEditor - though similar applications ought to work fine.

In Terminal, install homebrew: /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
In Terminal, install Node.js: brew install node
In Terminal, install striptags: npm install striptags
Go to your Mastodon instance via a web browser, logging in and going to Preferences → Import and Export.
There should be a button to create an archive of your data.
After the archive has finished collecting everything, but this may take a long time depending on how much data there is, press on Download Your Archive.
A .tar.gz file will download. In Finder, open and save its contents through an extractor such as The Unarchiver or Keka.
In Finder, locate the outbox.json file, and copy it into your home folder (mine is in my name of Jessica. You can quickly get to your home folder by pressing down Command, Shift and H at the same time - command+shift+h).
In a text editing application like CotEditor or SublimeText, copy and paste the above large JavaScript code, and save the file as convert.mjs in your home folder.
Next in the text editor, you need to change line 37. It determines the location of where you want the Hugo posts to be created. You will need to change the name of the home folder from Jessica to the name of yours.
In Finder, go into your home folder and create a new folder called posts.
In Terminal, you ought to be automatically in your home folder already, you can confirm this by entering pwd into the Terminal and pressing Enter. It should return with /Users/[NAME]. If you are elsewhere, keep inputting cd .. and hitting Enter until you see / % at the end, and then type in cd /Users/[NAME] and pressing Enter. Verify you are in the right place again by using pwd as described before.
In Terminal, type node convert.mjs and press Enter. Nothing may look like its happening, but if you use Finder and navigate to posts in your home folder, it should now be populated with .md files.
In Finder, copy and paste the .md files to where the posts go within your Hugo website.
You may need to edit the .md files slightly, due to possible formatting issues. You can do this within your text editor.

Optional Tweaks

As default, the script will generate .md files titled Note on [TIME] within the Hugo Front matter (as seen on line 22). You may wish to rename this.
You may also wish to modify the slug (the URL ending the post will have) on line 23.

Limitations

Only Public posts are converted. For example, replies, which are usually sent as Unlisted, will not appear, but if you sent them as Public, they will.
Any #hashtags will appear in the .md file, but won’t be functional.
Images didn’t import or load. Not an issue for me as I rarely use then, but if they’re a significant part of you Mastodon experience, you may want to look further in how the images can also be imported into Hugo posts.
Custom emojis didn’t appear either, with only their shortcode in text being visible, for example: :infinity_rainbow:
Content warnings vanished too.
The archive can only be requested every seven days, meaning your Hugo posts may often be out of date, depending on how regular you post.
You will also need to manually request the archive every seven days, extract it, and move the outbox.json file to the appropriate location.
Running the script will create posts from the entirety of the outbox.json file. If you are doing this weekly, make a note of the most recent .md file created, and then when you run the script every week, delete the .md files older than it, so you don’t end up with duplicates. Alternatively, you could copy and paste all the contents within posts to where the posts go within your Hugo website, being careful not to overwrite existing files, and instead just have the newly created files added.
As I alluded to in step 15: “you may need to edit the .md files slightly, due to possible formatting issues”. This is because while paragraphs appear normally within Mastodon, they do not convert well.

For example, a daily waffle game comes out looking like this: #waffle442 2/5🟩🟩🟩🟩🟩🟩⭐🟩⬜🟩🟩🟩🟩🟩🟩🟩⬜🟩⭐🟩🟩🟩🟩🟩🟩

Instead of this:

#waffle442

2/5

🟩🟩🟩🟩🟩
🟩⭐🟩⬜🟩
🟩🟩🟩🟩🟩
🟩⬜🟩⭐🟩
🟩🟩🟩🟩🟩

As such, you may need to spend time reformatting if you use paragraphs extensively, like me!

Other Options

If you are just wanting to archive toots, and not necessarily republish them, you may instead be more interested in mastodon-data-viewer.py or mastodon-archive, with the former using the export function again, and latter instead accessing the data via the API.

Not really wanting to put a strain on the instance I’m on by requesting all my data via the API for no real reason, I have only tested mastodon-data-viewer.py, which creates a local “website” only your device can access where you can browse your past toots (and replies!) easily by navigating between months, and by a search bar! As such, it may be useful to spin up every now and then just to use the search function if your instance (like mine) does not have ElasticSearch, and you want to find something in your past toots. Content Warnings and images are also supported here, if that’s important to you, and there are buttons to reply to the toots and to go to them directly.

Lastly, there is also Meow, which appears to have a nice interface, but I wasn’t too sure about trusting it as it isn’t open source and I’d rather be able to run something like that offline.

Conclusion

Despite the limitations, and although there are other methods out there which may be more suited to your situation (I recall stumbling across one using the API, but cannot find it now), I am personally okay with this approach. Although not fully automated, and needing manual intervention, as I know my toots have many paragraphs within them (as well as waffle game scores that do not need archiving!), I don’t mind doing a bit of light reformatting once a week to make sure the content all works fine.

However, if there was a way to make sure the converted posts carried the same formatting the toots had, I would be interested in alternatives - even if I still have to export the archive, run the script, and re-upload weekly. It’s never a bad idea to do a weekly backup of your Mastodon data anyway!