Using the 'ruby-readability' gem on Rails

July 3, 2024

In the last post we looked at a solution for parsing the text content of a webpage, such as a news article, using the mozilla/readability.js node package on Rails backend.

This time I’ll introduce the ‘readability’ gem to do the same task – in a more Ruby way.

I didn’t manage to get the gem to parse many websites, and I ended up going with the solution presented in the previous post. If you figure out how to get the gem nicely parse content from pages reliably, let us know in the comments.

Install the gem

Run the following in you Rails project folder:

bundle add ruby-readability

I’ll be using Faraday for my http requests, but you could use another gem. To install Faraday, run:

bundle add faraday

Using ‘ruby-readability’

Get the web page with Faraday

response = Faraday.get(‘example.com/article‘)

Parse the text content you want from the body of the page:

content = Readability::Document.new(response.body).content

Readability::Document instance attributes

You have available the following methods:

.images
.author
.title
.content

In my experience the parsing doesn’t always manage to identify the previous sections from the content however.