The Silent Library on IPFS Part 2

IPFS for TSL 2

This is a continuation from the first blog post.

I’ve spent a lot of time writing some better code to handle keeping the on-disk version of The Silent Library (or anything) up to date with what IPFS sees/knows about.  This means it scans both libraries and adds/removes files to/from IPFS as needed.

Through this experience, I wrote a script in python, and a script in go.  The go one is unfinished, but the python one is working.  In both cases I did this in a “dumb” way, where I wrote all the code to scan and compare libraries, got a clean list of differences/changes that need to be made, then looked at uploading files.  This is where I got stuck.  This is for various reasons: I am new to writing serious stuff in both languages, the IPFS doco is poor, and IPFS itself seems to be in constant flux.  This experience has shown me that IPFS is not a good solution for sharing The Silent Library, or possibly any large project.  Below I will list what I’ve done since the original blog post, as well as further details of the problems I hit trying to explore the IPFS ecosystem.

The Setup was documented soon after the first blog post. The IPFS / Summary sections where documented a couple months later.

 

The Setup

These steps could be useful to someone who is knowledgeable and wants to

At some point, docker (I think) was becoming too slow to access the files.  They were pinned locally but they still would take forever to download through the gateway, which was on the same docker instance.  I don’t get it.  Anyway, I decided to put it on my Windows server. In addition, the Windows server has a 10gig fibre connection to the NAS, so that’s fun.

To do this I downloaded kubo. I tried to run it as a service, but they don’t fork in daemon mode (I think), so I decided to try this the other way, using IPFS Desktop software.  This runs as a desktop app, so you must be logged in.  It sits in your systray and can be set to start when you log in.

After that, I installed Python 3.11 with winget and installed requirements for the script with pip.

I checked out the code, set up my config file, and kicked it off.  It took days to set up the IPFS version of what was on disk, but it did finally get there.  I set up the appropriate port forward (TCP/UDP 4001) through my router to this machine and off we go.  I also set up the reverse proxy to sit in front of the HTTP gateway (8080).

Steps

These could be adapted to other OSs, and I bet systemd could actually run the server as a real service (update: since writing this, I’ve moved to kubo on a Debian 11 server with a custom service file I wrote to run ipfs daemon as a regular user).

  1. Install and run IPFS Desktop

  2. Set up port forwarding so 4001 TCP/UDP are available publicly.

  3. Using nginx or apache or IIS or something, set up a reverse proxy in front of 8080 TCP. (IPFS Gateway docs – I use Path type).

  4. Do NOT make port 5001 publicly available.

  5. Install Python 3

  6. Clone the repo holding the code.

  7. Using the installed python, use pip and install the files in requirements.txt
    python3 -m pip -r requirements.txt.

  8. Set up your config file.

    1. Copy settings.cfg.example to settings.cfg (or tsl.cfg, whatever).

    2. Make changes to the file you just copied to.  Some notes:

      1. IPFS has limitations, so your `[remote] tslDirectory` has to be a subdirectory under where IPFS stores its configuration. This means to make it work you should have a writable directory IPFS can access, and TSL has to be under it somewhere.  To save disk space you could symlink or bind mount it into that area (depending on what’s hosting your copy of TSL).

      2. `[options] refresh` should always be True unless you’re debugging or know what you’re doing.

      3. `[remote] ipnsKeyName` is a unique name you set. It’s tied to your ipfs instance.

  9. On the IPFS Desktop icon in the systray, right click on it, go to Advanced, and choose Move Repository Location.  Choose the IPFS directory you configured in the previous step.  It will quickly move over.

  10. In the IPFS Desktop app, go to Settings on the left.  At the bottom is the config file text.  Go to ‘Experimental’ and look for ‘FilestoreEnabled’.  Set that to value to ‘true’.  Save the file and restart the service (right click the systray icon to do the restart).

  11. If everything has worked, IPFS is ready to be populated.  Go to where the code was checked out and run: python3 .\sync-tsl-to-ipfs.py –config tsl.cfg

    1. Use whatever config file you named

Output should look like this:

Then you’ll see it creating directories on the MFS filesystem and adding files.

It can take days due to IPFS limitations.

The final thing it does when done is “publishes” the current root directory of /The Silent Library (or whatever you specified in your config file) to a permanent id (using IPFS’s IPNS system).  This means that when future updates are done, the hash of the root directory can change, but people can always look at the IPNS location to find the current version of the root directory.  You can use this URL in the gateway systems.

An example of this working is at https://tsl.joyrex.net/

Anyone who does this will end up contributing to the “seeders”.  Their IPNS won’t be used, but all the stuff under it will be shared, so when someone grabs a file, some comes from me, and some comes from whoever is seeding.

 

IPFS Issues and Why It Isn’t Suitable

IPFS relies heavily on having its own copy of whatever you are sharing, splint up in chunks.  The only reason I could get this far was because of an experimental feature called “filestores”, which lets you use a backing of a real filesystem and it only holds its internal metadata to allow the system to work.

That said, the API seems to require uploading the entire file when you want to add the file to IPFS/MFS, even though it is not storing it anywhere and is backing onto a filesystem.  This makes massive adding take way longer than needed.

Speaking of the API, it seems to be in constant flux, with documentation for various versions of what you should use, but it’s not actually useful.  The API doco is good in that it seems to generally include the parameters to calls with a terse description, but the examples (when they exist) are generic to the point of useless.  In addition, there’s constant references to referring to their examples, but these “examples” also appear to be their test cases for their code, and as such, are written in a very abstract way that isn’t useful at all for someone just trying to explore.  I can appreciate looking at code to learn vs someone having to write a blog page for newbies, but the code examples just aren’t useful unless you’re deep into the ipfs ecosystem.

I wrote my script in Python and in Go.  The go instance I stopped working on once I saw the weird way you have to send files to the API (multipart mime with specific headers.. and again.. chunk streaming the file).

There are various complaints, mainly around people being expected to have a deep understanding of the internal ipfs system to interact with it, and its use, but I’m tired and done with it.  I am keeping my instance going for fun (and since I started writing this article, it’s been a couple months and I’ve since moved to running it on Debian because IPFS Desktop screwed up on an upgrade).

 

Summary

IPFS is a cool idea, and could be extremely powerful for certain things, but it’s largely limited by its own massive scope, and the large amount of breaking changes and/or documentation something like this requires.

I’m still going to keep TSL going, on IPFS, just for kicks, but it’s not a good solution for everyone.