I'm a brewing engineer by education, working these days as a Ruby developer. In this blog I post mostly technical writings

Combining readability.js and the ‘node-runner’ gem

I’ve been working on the second iteration of my RSS client and wanted to include the possibility to read articles within the reader app, in the style of Firefox’s reader view.

I also intend to use full-text search on the feed entries in the future, so want to save the content data in my database.

To these ends, I’ll use a Node.js package server-side to remove website clutter from the source page. To run node server-side, I’ll be using the ‘node-runner’ gem.

In the next blog post we’ll take a look at another option for parsing content for a reader view – using the ‘ruby-readability’ gem.

Readability.js and building a DOM document object

Readability.js wants to consume a DOM document object. To create one, we’ll be using the jsdom node package.

Required packages

Make sure you’ve installed Node.js first. On Ubuntu you can install it by running:

sudo apt update && sudo apt install nodejs npm

On the Rails side, we’re going to install the ‘node-runner’ gem, so, within your Rails project folder run:

bundle add node-runner

I’ll be using Faraday for my http requests, but you could use another gem. To install Faraday, run:

bundle add faraday

And we’ll also be installing the node packages we want to use:

npm install jsdom
npm install @mozilla/readability

Basic usage

Get the web page with Faraday

response = Faraday.get(example.com/article)

Instantiate a NodeRunner object, require the node packages, add a JavaScript arrow function for parsing the html into a DOM document object and output the readability.js parsed content:

runner = NodeRunner.new(
  <<~JAVASCRIPT 
    const { Readability } =  require('@mozilla/readability');
    jsdom = require("jsdom"); const { JSDOM } = jsdom;    
    const parse = (document) => {    const dom = new JSDOM(document);
  return newReadability(dom.window.document).parse()
}
  JAVASCRIPT
)

After which we can pass the GET response body to our NodeRunner instance and receive the parsed content as a string:

readability_output = runner.parse response.body

And that’s it!