Data-oriented, clean&hexagonal architecture software in Rust – through an example project

September 5, 2021

This post and work behind tries to achieve multiple goals:

It's a follow-up post and code example to my previous post titled “How I structure my apps (in Rust and other languages)”.
It's a follow-up post and code example to “Growing Object-Oriented Software vs what I would do”.
It's a showcase, research, and example project on the cross section of event-based systems and using hexagonal/clean architecture in Rust.
It is a description of my general approach to building application/services, so I can share it and receive feedback, possibly improving how I do things.

First, important notes:

The code example is a work in progress, in many places incomplete, and possibly still having a lot of logical bugs. I was mostly focused on a higher level design and hard implementation issues that I can use as an example. I am not trying to implement a production-ready, bug-free software. I might or might not keep changing things in it. I'll be using permanent links using code commit hashes, so you might want to look at the latest master branch version to look for any refinements. I do accept PRs, constructive feedback, and collaborators if anyone is interested.

The code right now is very sloppy. I've already spent somewhere between 5 to 7 evenings (2-3 hours each) on it and I estimate that it would take a similar time to complete and polish it to a reasonable state. Coding takes time, especially when trying to figure out good design approaches to certain problems here (in particular as some of it seems to be quite novel in the case of the Rust programming language), and I still have to do some auxiliary work (like writing this article), while I have very little free personal time. I only hope it still going to be valuable to some people (including myself!).

Introduction

The application I'm building here is called Auction Sniper and comes from the Growing Object-Oriented Software, Guided By Tests (aka GOOS) book, that I read and reviewed as a part of my OOP research&criticism.

I think the book itself is quite good, and I love the format – going through building a real-life application, but in my opinion, the OOP&TDD-driven design does not produce good results. Read the previous blog post for details.

The application itself is an online auction bidding bot:

It connects via XMPP protocol to an “auction house” server to which it can send messages about joining an auction of a certain item, place bids, and receive messages about other bids and final auction results.
It has a visual user UI allowing joining auctions and setting a maximum bid price and shows the status of each auction.
It places bids for the user in reaction to user UI and auction house events, trying to outbid other auction participants up to a certain price.

Creating a high-level design

In my opinion, the first step of building any real-life application, before doing any coding or technical work is coming up with a general concept/design.

First I need to know what higher-level problem am I trying to solve. In this case: we're building an auction bidding bot. But for who? The individual user running on a personal server/computer? Or is it a service for multiple users? What are the performance requirements? What's the ballpark number of events in the system per second? What are our availability requirements? Major big-picture things like that.

Depending on the most important high-level requirements, completely different designs might be necessary. It's a different thing entirely to build a single-user application running on a PC or mobile phone; a low-intensity web application for a small startup; or a FANG-level industry-grade scalable and geographically distributed service.

In my experience for any non-trivial application, one can't just build fit-all-sizes solutions. One of the #OOP obsessions seems to be the idea of universal reusability and interoperability. But just like it is not possible to have a mechanical engine suitable for all: race cars, family minivans, tanks, and helicopters, it's not possible to have bussiness application designs and even components that can be used in any use-case. The best one can hope for is a design that is: suitable for the current use-case and available resources (like development time), flexible enough to accommodate some future minor requirement changes, that scales OK to a point (let's say 100x), that can re-use a lot general purpose infrastructure and leaves the door open for relatively painless future rewrites and a complete overhaul of the approach.

Anyway. In this case, we can assume that we're building a single-user application, possibly being run on a PC, with minimum performance and scale requirements. Relatively easy and simple.

Data architecture

The single biggest element of the high-level design is the data architecture. By that I mean: how is the application going to accept, process, store and output all the data it needs: both in memory and in some persistent stores.

My main criticism of the project from GOOS was that it completely ignores the data persistence aspect. The application would lose the whole state after shutting down. From my perspective it's unacceptable. Not just from the functional perspective, but a didactical one. An application that does not take care of persistence is not very real-life-like, and any software design methodology absolutely must address this aspect early. Sure – some applications don't have to persist any data, but that's generally uncommon, and that's a case so easy, that it's not worthwhile discussing.

From my experience persistence is where things go wrong with OOP fast (at least a naive OOP, usually given as an example in books and educational material). I am yet to experience a non-trivial project written with OOP mentality where https://en.wikipedia.org/wiki/Object%E2%80%93relationalimpedancemismatch is not a major issue, and people are in love with their DAOs and ORM.

Auction Sniper has rather simple requirements: it needs to track the state of each auction in some persistent store, possibly using auction item id as a primary key, with the rest of the data regarding the state of the auction in the record/document associated with the key.

There are some complications here around the asynchronous nature of the auction house communication and the UI. In a real implementation, care must be taken to guarantee atomicity, idempotency, and reliability of handling external events. It can all probably be solved quite simply with just database transactions, coarse grain application locks, etc., but for some reason, I decided to overcomplicate the design by using an event log as a communication and synchronization mechanism between all components. Probably because I have been studying topics related to event-based systems like Kafka, event logs, and so on.

It is what it is – overcomplicated or not – that's what I implemented. The event log here is understood very loosely and in a basic form. It is just an append-only table in the database. All application services are following events from that log in order and/or writing out new ones that they produced.

The benefits of event-log based communications include:

Built-in atomicity and ordering of events.
Built-in audit trail. The application can easily show all the bids in a given auction that led to its overall outcome.
Natural and simplified way to decompose services.

The main downsides:

It requires quite a bit of additional code for event log communication.
The necessity to consider the unique issues of event-based communication like dual write issue. In this case, things are somewhat simple because the event log and all other persistent data are in one database, so it's possible to just use one database transaction for everything: keeping track of position in the log, reading next event from the event log, updating any stateful data and writing out new events.

There's very little for this application to store in memory. The mutable state and communication are persisted already and can be fetched and saved on every use. Some form of write-through in-memory caches could be easily added if needed for performance. Since the application will run as a single instance, with exclusive access to the persisted data, there are no problems with cache invalidation.

Services

The data architecture that I've chosen makes it simple to decompose the application into “services”/“actors”/“main threads” – parts that can run in parallel, independently, that can communicate by passing relatively little data.

It's easy to see at least the following:

Auction house handling. Sending and receiving XMPP messages can happen on separate threads. One thread follows the event logs and sends corresponding XMPP messages to the auction house server when needed. Another listens for new XMPP messages from the auction house server and writes them as events in the event log.
UI. Irrespective of the form of the UI, a separate thread(s) can write out any user action as an event in the event log, and another one can follow the log and update the UI.
Bidding engine. A separate thread can follow events in the log, reacting to auction house and UI events, modify their state, and write out new events.

Implementation

Repositories/ports in the hexagonal architecture

I am a fan of the hexagonal architecture. One aspect of it that I particularly like is expressing every external interaction as one or more interfaces. This puts a clear separation between the business logic and the IO logic, with a clearly defined interface in between. Ports are injected into services using them (Dependency Injection), allowing testing with mocks and interchangeable implementations.

A typical example would be something like :

trait UserStore {
  fn load_user(id: &UserId) -> Result<Option<UserDetails>>;
  fn store_user(id: &UserId, details: &UserDetails) -> Result<()>;
}


// actual type used around the code, for a (potentially)
// shared handle to the user store 
type SharedUserStore = Arc<dyn UserStore + 'static + Sync + Send>;

A store (aka repository) is just an API to operate on a certain aspect of a database.

Let's look at the example from the Auction Sniper code: a BiddingStateStore.

This store is used by the BiddingEngine service to load and store data about each auction. As you might have noticed it's somewhat more complicated than the previous example. The reason for it is that it has to support database transactions spanning across multiple both: stores and operations on them . Some say “this is the most complicated part in clean architecture” and it is not exclusive to Rust or any particular language.

My solution is to introduce additional types to express the concept of the underlying persistence type (eg. database) with connection and transaction to it. The connection/transaction instance has to be passed to every method of the store interface. The additional type signatures look a little bit scary and it requires an unstable Rust feature, but other than this the usability of it seems completely fine.

Event log

An event log is abstracted with two ports: a log Reader interface and a log Writer interface. Some parts of the code need only one of the two aspects of it, so it's easier to express their interactions with the rest of the system by splitting these two aspects. As usual, the first actual implementation of each port is a fully functional, but fake, in-memory one or something functionally similar.

Service control

As of the time of writting this post I've only implemented the AuctionHouseSender, the AuctionHouseReceiver, the BiddingEngine services.

I never got to writing the UI service. I envision it being an HTTP server with a handful of routes. Since most HTTP libraries in Rust are using async/await, it would require some additional infrastructure for async services and ports. No blocker really, but additional work nevertheless.

To handle all services in a controlled and uniform way, a ServiceControl is used. The job of SerivceControl is to spawn threads that will trigger service handler functions until the service or the whole system is supposed to stop.

Every service is expressed as a struct implementing a trait, expressing the type of the service. A LoopService is rather general, the LogFollowerService is triggered by any new events in the event log, and ServiceControl takes care of things like polling for new events, and handling the offset in the log.

The general control strategy is quite simple: there's one shared stop_all atomic boolean flag, and each service has its own exclusive stop flag as well.

Any service returning an unexpected top-level will set the stop_all to true which will cause all services to stop. And JoinHandle returned when a service is started being dropped will cause that one service to stop.

Since there is no way to cancel/interrupt a thread executing a given service, the whole approach requires all services to be non-blocking. Well... at least to not block indefinitely. All blocking IO must support timeouts. This is not a problem in this application. Using async services would remove this requirement, since all IO is non-blocking with async Rust, and any async task is cancelable.

In some other Rust projects, I used channels as a communication method between services. Each service would effectively be a loop handling messages from a channel. Stopping a service might be a combination of shared atomic flags, sending a certain message to it, or closing all sending ends of their channels.

The main function is in essence:

creating all the resources and instances of all the ports,
injecting them into services that need them,
spawning all the services,
joining in the loop on all service JoinHandles to check and bubble up any errors.

Gracefully handling Ctrl+C and kill signals is easy.

Functional core, imperative shell

When possible I try to express every service's business logic with a pure functional code, leaving all the side-effects like reading database state and calling external services at the very bottom of the call stack.

In the BiddingEngine the event handling functions are pure: they take the current state of the auction, etc. and return the new state of the auction and new events to emit on the log.

There are, sometimes, good reasons to give up on the purity of the business logic code, usually around performance and resource usage, but generally, I'm OK with sacrificing quite a bit of these, just for the sanity and testability FP offers.

Testing

I might not be a TDD zealot, but I do agree that writing tests early forces a good modular, flexible, and thus testable design of the whole code. One of the main reasons for putting all the IO behind interfaces (ports) was to improve testability.

Having said that in my experience the value of tests diminishes much quicker in Rust than in other programming languages. By spending more time on modeling the domain using a type system, and using all the language support for avoiding mistakes, I get very solid reliability even before I write a single test. Because of that in Rust projects, I often focus on some sanity checks, especially of more complex logic and then of the corner cases. Mostly to prove to myself that the code is indeed easily testable, and then use it to test things worth testing.

In this project, since I'm short on time, it's not a real project and no one is paying me for this stuff, I wrote only a couple of tests just to test the APIs and have an example of how testing looks like.

It's possible to test the same thing on many levels. BiddgingEngine can be tested by calling the FP business logic directly, or as a standalone instance (not even “spawned” as a thread) by calling its event handling logic directly, or by spawning it and writing and reading messages to and from the event log.

Miscellaneous notes

I do like to keep the abstraction layer segregated like in The Onion Architecture, but for a small project like this, I don't find it very important, and I didn't have time to iron it all out. With the basic hexagonal architecture principles in place (ports), it should be always relatively painless to shuffle some types around and segregate layers.

The persistence abstractions are modeled after APIs from the r2d2 + postgres crates combo. I have added Postgres persistence implementation and one method of one port Postgres implementation just to make sure the whole thing will compile and possibly work. All these traits and the general abstraction may require some tweaking to work with other database types, etc. But that does not worry me much. In practice, an application like this requires supporting only one real type of port at a time. All the additional abstractions are used for modularity, composability, and things like Dependency Injection for testing, and not to try to be able to store auction state data in either MongoDB or Postgres depending on the configuration flag.

The fake in-memory implementation of the event log uses parking_lot::Condvar neatly to implement polling with a configurable timeout.

Summary

And that's all I've got (the time for right now): That's how I tend to design and implement things. And some (incomplete, but still) code for you to look at to make it more concrete.

Feedback welcome, PRs welcome, see you in the next post.

Oh, BTW. If you find Rust & hexagonal architecture combo interesting, you might be interested in another blog publishing posts about it, that was recently showing up on r/rust.