Mastodon to Blog Archive script

Tantek Çelik blogged about Mastodon’s account migration and its post export, which is based on ActivityStreams. No other apps really import this format yet, not even Mastodon itself. He also mentions the Blog Archive Format and how useful it would be to convert between Mastodon and this format:

Such a library would make an excellent drop-in addition to any #ActivityPub implementation, allowing both export of posts, and also a browsable archive format, so you could visually double check when importing to another service that these were the old posts you were looking for.

I’ve taken a first pass at writing a Ruby script to convert Mastodon’s export to Blog Archive Format. It’s available as a GitHub Gist here. It’s not packaged as a general-purpose library but certainly could be adapted for that.

Direct posts import from Mastodon will be baked into Micro.blog soon. We already support several formats — WordPress, Medium, Tumblr, Ghost — and I learned a lot about how best to process large archives while building the new Twitter import.

Dave Winer

– what is “blog archive format” – haven’t heard of it.

Manton Reece

@dave Updated the post to link to the IndieWeb wiki page about it. It’s a format I proposed a few years ago, basically just a ZIP file of posts (HTML, JSON, and images). The goal is to make it easier to package up posts and move them between systems or back them up.

Dave Winer

– maybe we should have a conversation about that, because anything with that kind of name should be some kind of a consensus, imho.

Colin Devroe

@dave I would like to be part of this conversation.

Manton Reece

@cdevroe @dave I welcome input. I wanted to pick a name that was generic because right now there are too many formats that are tied to specific platforms. Tumblr, Ghost, Mastodon, Substack all have completely different archive formats that (I think) are unsuitable for broad support.

Dave Winer

– i came up with something like this in 2002 called RSS 2.0, and this is specifically provided for in the source namespace. i think it’s important if you want lots of support in something like this to not “welcome input” rather “steal from the best.” ;-)

Colin Devroe

I was going to say what @dave said. I’d use RSS and HTML. RSS to be digestible by computers, HTML to be viewable by humans.

Manton Reece @manton
Lightbox Image