ORM – My Way With Java

At first “they” invented ORM and JPA and object databases failed. Programmers started to experience common problems like how to create, read, update and delete objects in J2EE on server-side. So someone came up with the idea of CRUD:

// adapted from https://dzone.com/articles/generic-crud-facade-your
// note that update and delete return void
public interface FootprintEntityFacade {
	T create(T entity) throws EntityExistsException;
	T read(Serializable primaryKey);
	void update(T entity);
	void deleteO(T entity) throws PersistenceException;
}

Soon it became clear that the solution was not that clean and general so “they” started a discussion about Anemic vs. Rich domain objects. But it seems that after awhile this discussion was forgotten and enterprise Java programmers were happy injecting (DI) service beans (three layer architecture). And besides, does it really make sense to save entity state with “myEntity.save()”: why would entity save state itself? There must be a better solution…

The next step in evolution was evolved CRUD: Rich-domain Object with Data transfer object (DTO) thinking (see code example below). And it gained popularity so much that Spring Framework decided to implement it. Currently Spring data has JpaRepository that extends CrudRepository. Spring data is fast, reliable, and productive, but it has a problem with types. Generic typing means that you need different types of repositories for different types of entities: if for example, you use dependency injection, it means injecting a lot of different repositories. IMO the implementation is not the “cleanest” one – although one of the best I have encountered. For example,

// Needed for crud repository
@Repository /* One for every entity type! */
public interface CarRepository extends JpaRepository<Car, Long> {}

// JPA reporitory
public interface CrudService {
    public T create(T t);
    public T find(Class type,Object id);
    public T update(T t);
    public void delete(Class type,Object id);
    public List findWithNamedQuery(String queryName);
    public List findWithNamedQuery(String queryName,int resultLimit);
    public List findWithNamedQuery(String namedQueryName, Map parameters);
    public List findWithNamedQuery(String namedQueryName, Map parameters,int resultLimit);
}

So what can be done? What is the next step? EntityStore is my answer! But what you see next is only a step towards the final goal, so don’t take it into to use.

// Generic E type, implementing types carry the type information (type erasure warning)!
public interface EntityStore<E> {

   // create new entity, not saving or updating existing entity
   public <E extends DataEntity> E createOne(E e);
   // no find methods, read handles retrieving entities
   public <E extends DataEntity> E readOne  (E e);
   // update always overwrites, whereas read always retrieves state from db
   public <E extends DateEntity> E updateOne(E e);
   // delete returns the deleted entity which is removed and therefore new
   public <E extends DataEntity> E deleteOne(E e);

   // creating empty list returns all entities, see next paragraph!
   public <E extends DataEntity> List<E> createAll(List<E> e);
   // reading empty list returns all entities, see next paragraph!
   public <E extends DataEntity> List<E> readAll  (List<E> e);
   // updating empty list returns all entities, see next paragraph!
   public <E extends DateEntity> List<E> updateAll(List<E> e);
   // deleting empty list returns all entities, see next paragraph!
   public <E extends DataEntity> List<E> deleteAll(List<E> e);
}

Here it is important to realize that all entities of type E expose a public method what allows asking their ids! So for instance when in the calling side you read an entity of type E, you create a “template entity” – if you will – with a certain id, and then fetch the actual state of it and finally return the actual entity with correct state. All of the above methods are implemented as basic JPA methods and that is why the entity manager reference is also required to actually have a working store.

Final solution!

So the ultimate CRUD pattern is the one below (remember also to use the helper method “List.of” to create immutable lists). The catch is to understand the logic behind the following methods, and especially the input parameter and output result. The list of entities that is passed to the method consists of entities with ids. Every method returns always all existing entities, which is why for example the read method returns all existing entities even when it receives an empty list as a parameter! At first this might seem counter-intuitive, but comparing all returned entities against the passed ones (as a list), you can always figure out what was the result of the method by examining their difference. Of course, it is not computationally efficient to always fetch all entities and compare them against the parameter list, but it is how you must think it! Also, if ‘query’ (see later) is used it is always executed first, and then the intersection of them and the passed entities are only used in the operation! Actual implementation uses a shortcut implementation (usually the use cases are CRUDing ‘one’, ‘some’, and ‘all’ entities – and ‘none’ special case):

Creating a list of entities returns the list of created (now existing) entities. Empty list if none created.
Reading returns entities which are populated from database, but an empty list as parameter returns (counter-intuitively) all existing entities! Read operation replaces the traditional findX methods…
Updating selected entities returns a list of entities which were updated. Sizes of the input and output list might in fact differ for example due to a processing error
Destroying (or Deleting) entities returns the deleted (now non-existing) entities! Getting the references can be required in some scenarios

// E.g. ConcreteEntity extends DataEntity implements Serializable

public interface UltimateTypeFreeGenericCrudEntityStoreByJarirajari {
   // For single entities
   public <E extends DataEntity> E create  (DataEntity entity);
   public <E extends DataEntity> E read    (DataEntity entity);
   public <E extends DataEntity> E update  (DataEntity entity);
   public <E extends DataEntity> E destroy (DataEntity entity);

   // For multiple entities
   public <E extends DataEntity> List<E> create  (List<DataEntity> entities);
   public <E extends DataEntity> List<E> read    (List<DataEntity> entities);
   public <E extends DataEntity> List<E> update  (List<DataEntity> entities);
   public <E extends DataEntity> List<E> destroy (List<DataEntity> entities);

   // With 'query' you can run anything with SQL
   public <E extends DataEntity> List<E> create  (List<DataEntity> entities, Optional<String> query);
   public <E extends DataEntity> List<E> read    (List<DataEntity> entities, Optional<String> query);
   public <E extends DataEntity> List<E> update  (List<DataEntity> entities, Optional<String> query);
   public <E extends DataEntity> List<E> destroy (List<DataEntity> entities, Optional<String> query);
}

public class Demo {
   // One store for all types - inject only one store!
   @Inject private UltimateTypeFreeGenericCrudEntityStoreByJarirajari store;
   List<ConcreteEntity> allEntities  = this.store.read(Collections.emptyList());
}

Everything looks good, but one more thing is required: what if you want to find an entity either with a secondary/alternative key or do some sophisticated query – without using the internal entity ‘id’? Let’s say that you define Person entities with id, firstname, and lastname. Using the previous store approach it is easy to control the entities using their ids, but there is no direct way of adding the more advanced queries to store, because you cannot pass a query or a map of parameter values… Instead, let’s borrow an idea from REST design. Remember my post about REST? So basically a HTTP resource endpoint has both “path” and “query” parts. We can take the resource path part as the type of the subject caller is requesting and the resource query part as additional information. Just like in my post, the resource URL then transforms into resource filter + additional query endpoint. For example, “/apis/api/users/all?age>40&smokes=false” could return all users who are more than 40 years old and don’t smoke. In this example, ‘all’ is a filter that filters none users and ‘age>40&smoke=false’ is additional info. So why not embed this query part idea also to the new and shiny CRUD store too! This is achievable in Java with JOOQ.

It is not straightforward to come up with a suitable solution for passing the query part in each scenario. But for example, one of the simplest ways is extending the store class. You can be also pass the query as the entity metadata field/property – see Data Has and Must Have Metadata – Plain Type Is Not Enough!. For example, “new ConcreteDataEntity().myField1().query(“value < 42″);” actually reminds a bit of GraphQL… Consider also using annotations and ThreadLocal in Java for passing the relevant query information. Using a auxiliary method that takes in a complex SQL query, executes it, and get relevant entity ids, and passes the back to the calling method, can be used for avoiding making the store more complex! That is 1) use specialized method to execute the SQL query to get the entity ids, and then 2) use this generic store to retrieve the entities with the help of the fetched ids. Or maybe the simplest way is just to pass a query string or a Java Lambda that returns a string… In summary:

Equivalent for HTTP verb is method name i.e. create, read, update, destroy
Equivalent for HTTP (body) data is the list of entity IDs
Equivalent for HTTP resource i.e. path is the type of entity
Equivalent for HTTP query part is the query part which was discussed in this section
There is really no alternative in HTTP for the “filter” that I have written about previously in my API design post

And one more thing. Let’s assume some of the entities are transient meaning you don’t need to persist them. I call such objects “items” instead of entities. You can add one boolean field to each entity to declare if the object is transient or not. This way you can actually use same store for both types of entities! And yes, transient entities are handy! For example, RESTful resources can be transient (your data model is not an API).

Using RESTful resources is quite similar (CRUD / CRWD i.e. Create-Read-Write-Destroy):
Construct/Create---HTTP POST--------Create object, start of the life-cycle Retrieve/Read------HTTP GET---------Find AND read object (discard changes) Update/Write-------HTTP PUT/PATCH---Create a new version and keep the old version Demolish/Destroy---HTTP DELETE------Delete object, end of the life-cycle
Often you hear fellow developers saying that from the permission/rights point of view Update (Write) and Destroy (Delete) are equivalent. And in many implementations they are in fact equivalent: you can update any object/resource to lose its data/meaning, but the key idea is to add a missing piece. The missing piece is versioning: when you keep full version history of an object/resource, you can always return to previous state after any (even malicious) update/write! And whether your destroy/delete is soft or hard – it doesn’t matter anymore: a symmetric operation for creation is destruction. And at least for Java there is a ready-made implementation for this: Envers (see e.g. audit log using Hibernate Envers).

In a way CRUD is actually about object/resource life-cycle and therefore better acronym would be CUD. Reading is not only about reading the state of an object/resource/entity. From this day on, it gets an promotion: to Read an object means first finding it and then populating its state. You DON’T need those nasty entityManager.find* methods because Read already includes that part – at least if you follow my implementation of CRUD! But as mentioned above, the find* methods might be suitable for fetching entity ids which are used then to retrieve real entities… Finally, Create, Update, and Delete ops are write operations, but Read is usually considered to be a read op. I would like to challenge this idea: I consider even Read to be a write operation! This is because the Read overwrites any unpersisted changes for any entity, which simplifies things because there are no more read CRUD operations – only writes are left!

P.S. Modern Object Storages like AWS S3 have an interesting link between how they work and how my CRUD storage works. You can store and retrieve objects content and/or their metadata to/from cloud, and S3 can be used with many programming languages (including Java) via SDK or HTTP API. The actual link is that S3 objects support versioning, which means that you can have multiple (versioned) instances of a (logically) same object: every object change is stored as a new version. You can retrieve the newest object without version id, or older one with using the id. What is even more interesting is that you can query object using SQL! Additionally, it supports WORM (write-once-read-multiple) for auditing, object soft delete (and also hard delete) by using a deletion marker, and also archiving via object life-cycle by using storage classes. You can also host on-premise S3 compatible open-source object storage yourself in cloud is not an option, for example see min.io

Domain objects are high-level business objects. Programmers often use a “domain object” concept at a wrong level: the wrong usage can often be described as naming a “domain object” with the same name in a database table. There is nothing wrong in ORM or using it, per se, but there is one caveat: ORM is about mapping relation tables into objects, which is a flawed concept in object-oriented world (a related book: Succeeding with Object Databases: A Practical Look at Today’s Implementations with Java and XML by Chaudhri and Zicari).

“An object database (also object-oriented database management system) is a database management system in which information is represented in the form of objects as used in object-oriented programming. Object databases are different from relational databases which are table-oriented: they are a hybrid of both approaches.” Object databases are better than ORM for objects for OO programming, but there is a better alternative. Even in JPA2 there is a thing called “Entity Graph”, which leads us to “Graph Databases”: but do not mix entity graphs to graph databases!

A real world modern graph database could be for example OrientDB or Neo4J. Both of these are graph databases, and at least Neo4J is a better match to modeling and using objects than relational databases with ORM. For instance, take a look at JCypher that provides a query language that is similar to SQL but it is more readable. Object oriented data naturally forms graphs so there is no need for extra mapping layer.

For example, you can have a database table called “Insurance” that is mapped with ORM to a “Insurance” object. However, the Insurance object is named wrong because “Insurance” is a domain object and hence a grouping object: an “Insurance” object holds every piece of information that is needed to use that object in business methods. An “Insurance” persisted in a relational database is really not an insurance but rather an insurance model that will get richer/smarter when it is loaded into a program: it is plain container for relational data.

ORM maps data between database tables and software objects. Is this mapping really necessary? Could it be that it causes pain to programmers and should be removed? I will start by asking a question: “What if there was no need for databases?” This implies that RDMBS are efficient but force programmers to use anemic domain model where data is represented as entities without business logic. In a rich domain model business methods can be constructed in smart ways where all CRUD-operations are hidden (by encapsulation) from other domain objects, relations are independent separate domain objects, and domain objects expose their (business) behavior!

Next I am going to lay some rules I have found to be useful in rich domain models. First, every domain object (DO) must have a business ID that is different from the row ID in relational databases. Second, every DO must have a default implementation, we need also a default ID. The default is zero (for empty objects), but every other object should have an ID that is not zero but not null (note that NULL is not allowed ever!). Third, collections should be represented with independent objects (as separate classes): for example, use GroupMembership between many Groups and many GroupMembers (M-to-N relation) instead of a mapped Collection.

There could be two types of domain objects: non-persisted DOs get ID from a sequence (that in turn should be persisted if needed), and persisted DOs which get their IDs from persistence provider. Growing IDs and no reuse (immutability), would make possible override “equals” and “hashcode” without any bigger problems – at business level. Also caching would be relatively easy to implement. A concept of grouping must also supported because of the association: see Appendix below.

What else?

Every domain object must have a numeric unique ID (primary key) but also other candidate primary keys, because sometimes the first numeric primary key is not what the rich domain model requires!
As said, “associations” should be mapped as separate objects, e.g. Club, ClubMember, and ClubMembership (Club,ClubMember) Associations include: unidirectional and bidirectional directions with one-to-one, one-to-many, etc. cardinalities
APIs take in and put out only data in String format! ALWAYS! This is what should be different in JSONs: {issue: “this should be like {‘issue’: ‘no complaints’}”}…

I think that rich domain models allow more efficient and natural expression of business logic in source code but it is likely a trade-off: maintainability and readability vs. performance. So it is not a magical solution for every problem the software industry suffers from , but I think that it would be better suited in many cases – at least better than traditional solutions that use ORM.

APPENDIX. Relationship between collection IDs and REST APIs:

Collections.

/api/v1/groups => (return metadata for this category)
/api/v1/groups/default => (a reference object, replaces NULL)
/api/v1/groups/all => (i.e. every collection)
/api/v1/groups/ => (i.e. return filtered collections)
/api/v1/groups/none => (i.e. no collections)
/api/v1/groups/group => (return metadata for the group objects)

Objects in a collection.

Zero ID (0L) is the default object => /api/v1/groups/group/default
One ID (1L) is a real group => /api/v1/groups/group/id=1
Two+ ID (2L+) is a real group => /api/v1/groups/group/id=2

Tag: ORM

Not Your Father’s CRUD

Why Object-Relational Mapping (ORM) Is Wrong?