Data vs Code (aka Objects) OOP conflation and confusion

May 30, 2021

In one of post on the excellent blog of Ted Kaminski he talks about Data vs Object distinction. When I was reading it I got excited because I've been mulling about this exact distinction when crystalizing my problems with mainstream-class-oriented-OOP. And I think we're both aiming at the same thing, but I have drawn the line between the two differently. Plus I have some other thing to say about this confusion.

What Ted Kaminski on his blog call an Object I simply think/call it Code or Resource. I'll continue by calling an Object though, since that's what most people call it. Note: in many programming languages both Data and Objects have to be expressed as Objects/classes, and the whole point of mine and Ted's writing is to make the distinction between them and explain completely different approach required when handling them.

My short definition: The Data is about carrying information. Objects are about behavior. The core problem with mainstream contemporary OOP is that it conflates the two.

The typical OOP example of Dog and Cat classes extending an Animal class is so bad because it completely confuses people. In a real software system, there are no (or at least shouldn't be) such things as Cats and Dogs as Objects. An application does not need parts of logic actually modeling their behavior. Even an application for Vet Offices will handle data about cats and dogs. An application like that would maybe have Objects like PostgresConnection (for storing data about cats and dogs), Cache (for caching data about cats and dogs), MailSender (for sending alerts to owners of cats and dogs), and so on... but not Cat and Dog itself!

In a game designed in (terrible) OOP fashion, there would often be Objects like Sword, Monster, Hero, etc. OOP brainwashing leads people to try to model real-life interactions and relationships of entities it's trying to simulate using Objects, while Objects are only useful for expressing the interaction and relationship of the system implementing the simulation itself.

In other words – just because your program is a game about Hero fighting monsters with a sword, doesn't mean that it's composed of actual Hero, Monster and Sword objects with attacks, is_attacked_by, deals_damage_to, and similar functions! Objects are supposed to help you organize and model the behavior of your system – the game itself, and not the behavior of things you represent in your system. Your game needs maybe a Renderer, GameStateStore, AssetLoader etc.

Another example: in a credit card processing system CardDetails and CardOwner, etc should be data! The system processes data about these, and is not built from them.

When designing a program it's important to be focused on the distinction between what your system is (composed of code aka objects) and what your system is processing (data).

Objects are natural and obvious doers. They are after all just a handful of functions with some shared state, bundled together in a consistent whole. Then the whole encapsulation, interfaces, etc. are useful and required! You are building a software system doing something and you want to make it by assembling well-defined, smaller pieces – often swappable along the well defined interfaces, e.g. for testing purposes.

The nature of data is much different. While there are exceptions like enforcing invariants, etc. usually, you want your data expressed in the most concrete, public, plain form as possible. Data should be copyable, serializable and any functions coupled with it must be about the data itself – transformations, getters, helpers, etc. Data should not have any direct references to any other data or objects, unless it's purely a composition/aggregation of smaller data. Relational data should use IDs of some kind if it references other data.

Collections like Stack or Array are a curious example because they can be considered to be both. A Stack can be considered an Object that aggregates data or objects, especially when push is called on it, but it can be considered an aggregation of data, making it just plain data. That duality is never really a problem, but it's worth mentioning. A collection is both an Assembler of data, and the data it assembled itself. Makes sense, I hope. One could possibly express Stack as StackHandler Object that contains some StackState data, but that's probably an overkill.

What makes things very confusing, is that oftentimes it does seem to make sense to name Objects like the data or things they give access to or operate on. Eg. in most programming languages there would be something like File. A File is an Object. It is a doer, and not the data of the file itself! To not confuse people regarding object vs data distinction, it would probably be better to name it FileAccessor or FileOperator, but that's longer and bit meh. Similarly the point of PostgresqlConnection is not to try to model or be the database data. It is trying to model an interface operating on the database using its API. The data will be passed in and out of it.

While this object naming colloquialism is often harmless, it's, unfortunately, the most common gateway drug to the terrible OOP-reality-modeling which I so detest.

Let's say in a Taxi application someone created a Taxi object that is an interface to send and receive some notifications and other data to a taxi car via GSM. Now the company introduces some new type of Taxi car – let's say a self-driving one and OOP-brainwashed developer habitually starts typing:

class SelfDrivingTaxi extends Taxi ...

Because a self-driving taxi is a taxi too, right? Haha! Had the class been named TaxiGSMAccessor or something, there would be no confusion. The developer would figure that probably some new data has to be passed somewhere in an argument to tell TaxiGSMAccessor, or new interface built, or something else. Anything is better than starting a messed-up class hierarchy that doesn't make any sense.

In summary: a grave problem with OOP is that by confusing people by dogmatically representing everything as objects, it makes them represent the domain model with objects that doesn't make any sense as component of the system itself. As a developer, keep in mind that this distinction exists and avoid conflating the two, which I hope will help you avoid many mistakes that uncritical approach to OOP leads to.

#oop