The Silent Library on IPFS

This blog post explains what Gaki is, what IPFS is, and how I’ve combined them based off the excellent work of others.

Gaki No Tsukai

I love Gaki No Tsukai. This Japanese variety show with some of Japan’s best comedians, both as members and as guests, is amazing.  They are probably most well known in the west for their Batsu (punishment) Games. Most of the popular Batsu Games are them being in a situation for a day (newspapermen, nurses, cops, students, etc) where they aren’t allowed to laugh.  If they laugh, they get slapped on the butt (normally).  It’s hilarious.

Gaki has various people and groups that also love the show and put in an AMAZING amount of effort doing subtitles for Gaki and Gaki-related stuff (other shows, commercials, etc) so non-Japanese-speakers can enjoy it.  This is all content we’d never be able to experience without these volunteers, who have been working for DECADES to get stuff translated. It’s a truly amazing feat, with different people shifting in and out of the scene during the years.  There’s various places that try to offer some or all of the translated work.  Here’s some of the big ones:

Unmentioned are various current and past timers, translators, QA people, organisers, web sites, etc.  People have volunteered so much time and effort and resources to get something out for everyone to enjoy.  It’s beautiful.

IPFS

I hadn’t used IPFS before, but it seemed cool.  It backends onto a bittorrent (or bittorrent-like) p2p network to break up files into chunks and distribute them among peers.  Sounds like bittorrent so far, but each file is like an individual torrent, so you don’t need a torrent file to hold a group.  The features of ipfs allows you to specify directories, so you can link to an ipfs directory to get a list of ipfs files under that directory, then download those files, and so on.

One of the benefits of ipfs is they have set up a gateway system, so people can access IPFS files via HTTP in their normal browser.  People can run their own gateway servers, but there’s also some big ones out there like Cloudflare.

It’s worth mentioning that IPFS has been around before “web3” existed, but they seem to have somewhat dived into all that BS. They’re using it for cryptocurrency or something, I don’t know.  It seems to somehow be linked to Ethereum.  I don’t know and I don’t care.  I’m ignoring all that shit.  I am focusing on using it what it’s meant for..  a p2p distributed filesystem.

Gaki For All

As I see it, there are three types of people that want to interact with Gaki, and what is offered by the various providers.  Here’s a table of my thoughts:

I see IPFS as being a possible alternative to the Mega portion of the TSL, giving us a way to offer individual downloads to people.  This can be useful if for some reason we don’t want or can’t use Mega (I believe right now there is a donator helping fund the Mega account for the library!  This community is so good!).

This comes with some risks. People need to use IPFS natively if they want to contribute back to “seeding” the filesystem.  The way IPFS works is some people have “pinned” files, which are always provided from that node.  In IPFS, when someone accesses a file (using IPFS Desktop or similar apps), they then serve that file for a while (there’s settings on how much gets locally cached and shared).  If people access files via an HTTP gateway, then that gateway will serve the file, but the individual downloader won’t contribute back.  This means if we don’t have enough people volunteering to offer their TSL libraries on IPFS, or not using IPFS, downloads could be slow.

That said, if it’s not obvious yet, I’ve set up a TSL repo on IPFS and am serving it from my node.  I also set up a gateway that lets normal users browse and download files from a current version of the library.  You can see that here: https://tsl.joyrex.net.  Notice all the links refer to ipfs.joyrex.net, which is my gateway server.  Files on IPFS get identified by a unique id called a “cid”.  This cid is the hash string you find in the URLs on that page.

This blog post (after all this intro stuff) will cover setting up an IPFS node in Docker, sharing the files on IPFS, and some extra notes.  This won’t apply perfectly for everyone, but at least should help anyone interested in starting down the path.

Docker

I am using Docker on my synology NAS, but any docker instance works.  You also don’t need to run docker, you just need ipfs running somewhere, including the ipfs cli program.  IPFS is a protocol, so there are multiple implementations of it. I am using kubo (was called go-ipfs).  It seems to be the biggest and most popular implementation.  Standard docs for setting up on docker are here.

On Synology, it looks like this:

Some notes:

After starting the instance, it generates a default config and starts the daemon.  Now we need to open a shell session on the docker container to make some config changes.  You should have a root ‘sh’ session.

IPFS Config

In the shell session, run the following:

# ipfs config —json Experimental.FilestoreEnabled true

This enables the filestore feature, which allows us to share files without being broken up into chunks on the filesystem (duplicating the data).  More information is here.

Restart the node.

add-tsl.sh

Into the mapped /data/ipfs directory (wherever you’re mapping it from, put the file there, and it’ll appear in the container), put the file add-tsl.sh. Check the file source to be sure, it is set up like you want (you may want to alter the filename, or the mounted TSL directory).

This script will go through TSL directory and add the files to the local ipfs store.  IPFS has the flat filestore, but it also allows building a directory structure in something it calls the Mutable File System. It creates directories and uploads the files into the directory, so the layout matches the original TSL.  It also uses the actual files to do the add instead of breaking files up and storing their chunks (duplicates the data).  This is why FilestoreEnabled had to be set earlier.

Start another shell session on the node.  Run the following to test:

# ipfs cat /ipfs/QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz79ojWnPbdG/readme

This downloads a file from IPFS (a readme file)

If that’s working, start the upload.  On my node it takes a couple hours to  upload everything to ipfs.

# cd /data/ipfs/; ./add-tsl.sh

This will run and add the tsl libraries.  Watch the file hash-list.  You should see entries like:

Ashita ga Aru sa:Ashita ga Aru sa E01.avi:QmWRwP4pdCrHrtysDj65ANT5kNupsddBBJrH8j3vZhKU8w
Ashita ga Aru sa:Ashita ga Aru sa E02.avi:QmRWhbcjqKbwPyiPiMbHcP8v5BYK8jMRwGm8N6cCp2G7AL
Ashita ga Aru sa:Ashita ga Aru sa E03.avi:QmWiRu95VBKmPd4BM7Fut4hcA1opJ9ymwhNZXm3UNbBo1R
Ashita ga Aru sa:Ashita ga Aru sa E04.avi:QmdMLtW7Nk8sCviTKRbyUcdtfZDcH7RGfsS6SuWE8hd2Pz
Ashita ga Aru sa:Ashita ga Aru sa E05.avi:QmXhHKM5s2xFFSRYgK9cB62qqCTAF7GntmW7oosKMw6zoW
Ashita ga Aru sa:Ashita ga Aru sa E06.avi:Qmd4K51kbaX7B2gzt8RKpsnZGRzxdtg6hyAKEfKgQ9ogk4
Ashita ga Aru sa:Ashita ga Aru sa E07.avi:QmVte4fZ5rhVdNFC25ZicNhzZWV5Hd58apyMP6DpwmMrbJ
Ashita ga Aru sa:Ashita ga Aru sa E08.mp4:QmbFq7w4kLTfLdTPZkzDxa7YEvEi7HaetKZEnGVf9YtuRY
Ashita ga Aru sa:Ashita ga Aru sa E09.mkv:QmSYx5kYPtvHc6buBmyEbkjgwgJrJtJ7WzB7Nvd8Kq7xRP
Ashita ga Aru sa:Ashita ga Aru sa E10.mp4:QmbwrxFK7v1kSqzLvaCpESdAEXJqBy8b3Wyp5JqW6wk18T

This shows it’s working correctly. If you’re getting blank entries for the third field (fields are split by :), cancel the script, remove the hash-file, and restart the script.

Errors will possibly pop up in the window the shell script is running that say:Error: to-files: cannot put node in path "/The Silent Library/Documental/Documentary of Documental/Season 1/Translator Notes/Documentary of Documental S01E02 TN.txt": directory already has entry by that name                                    
Error: to-files: cannot put node in path "/The Silent Library/Documental/Documentary of Documental/Season 2/Translator Notes/Documentary of documental S02E04 TN.txt": directory already has entry by that name                                    
Error: to-files: cannot put node in path "/The Silent Library/Documental/Documentary of Documental/Season 2/Translator Notes/Documentary of documental S02E04 TN.txt": directory already has entry by that name 

This means it’s identified it’s already been set up with the MFS.  This is fine.

The script can be run multiple times.  If it has already uploaded that file it’ll just report the hash of the file.

When this finishes, you’re in the online cluster serving these files!

(Optional) setting up a gateway

Gateways can be used from anywhere (unless you put limits on it) to access any files on IPFS.  Because of this, existing gateways can be used to access to the TSL files.  If you would like to set up your own, though, it’s pretty easy.

In a normal browser, access port 8080 (or if in docker, the equivalent mapped port) and access /ipfs/ QmPZ9gcCEpqKTo6aq61g2nXGUhM4iCL3ewB6LDXZCtioEB.  This is the cid of the readme file retrieved earlier.  Going to /ipfs/QmW7FFR7kJ6TraVU3G9MS12N6iUgr1gmyESYEYraArjogA will show you the root directory of the TSL collection.  From there users can access any of the current TSL files and download them, all over HTTP.

My gateway (and a link to the TSL is): https://ipfs.joyrex.net/ipfs/QmW7FFR7kJ6TraVU3G9MS12N6iUgr1gmyESYEYraArjogA

This can also be accessed from https://tsl.joyrex.net/

(Optional) playing with the RPC web interface

The 5001 port (which you should NEVER publicly make available) has a web interface available at /ipfs/. For example, connect in your web browser to localhost:5001/ipfs/ to access it (or if you’re in docker, whatever you mapped the port to). If you’ve run add-tsl.sh, all your files should show up in the Files area.

Random Thoughts

This method, while a bit convoluted, allows us to share without having to re-upload to each other. Syncthing is used to distribute once. Then we run the add-tsl.sh script (or some form of it) every so often to add any changes to our shared files (probably a cron with a find -ctime to get the latest changes).

I don’t know how it handles removing files yet. Well, I do know: it doesn’t. I still will have to implement something that removes files that exist in the MFS store but don’t exist in the library anymore.

Before I realised that the gateway instance had a great MFS browser built in, I wrote (copied) some shitty code from an example site to capture and display the links my own way. I also import the files and their associated codes into a DB table so I can refer against them in the future if needed. All that code exists in a repo I put up with my scripts: ejstacey/ipfs-tsl-tools (github.com)

Q&A:

Q: What about IPFS clusters (IPFS Cluster – Pinset orchestration for IPFS)?

A: This could still be an option, however I think it might conflict with us each only using syncthing to get a copy of the new stuff once. It’s more built for one node updating a file and it going out to the others over the IPFS network, which we avoid.

Q: Why doesn’t this replace syncthing or the torrent or something else?

A: Because everything used right now fits for its purpose. Syncthing is great because bipedal controls everything. We want him to be the curator. The torrent file is great because it allows people to get everything (up to a certain date) in one large collection using a popular format. IPFS isn’t good for either of those, but it IS handy for one-off grabs.

Q: Is speed going to be an issue?

A: Possibly. As it’s peer-to-peer, people who are using an IPFS client will share some or all of the files they’ve grabbed (IPFS has garbage collection that desides when to stop sharing a file. the person can pin it to their local instance to make garbage collection ignore the file). People who are using a gateway server don’t share back. If there’s a large group of “constant seeders” like on the torrent file, it should go quick. I know quite a few people who do syncthing also started seeding to the torrent file with their existing data, getting more seeders more quickly. Maybe we could do something like that here if people are willing. There’s also the possibility that if we use a gateway with a huge upload (like Cloudflare’s), it’ll pin the file locally on cloudflare’s instance, so for future people it’d be quick. That last part is just a theory though.

Q: Why?

A: It seemed like a good project for me to play with, and it could help the community. Mega is great, but does have limitations and costs. I thought it could be good to have a backup, just in case. Also bipedal has started thinking more about the TSL Raws wing, which is larger than the standard TSL (~3.3TB and growing vs 1.5TB), and it may end up that those files can’t live on mega with the current plan(s), so needs another way for timers/subbers/etc to get the files without jumping on the entire syncthing (or whatever).

If you want to know more, there’s a ton of doco out there on ipfs. If there are questions find me at @ejstacey on discord, or @ejstacey.joyrex.net on bluesky, or @ejstacey@kolektiva.social on mastodon/activitypub.