viernes, 26 de diciembre de 2008

JPA implementations comparison: Hibernate, Toplink Essentials, Openjpa, Eclipselink


Summary

This article is a response to the lack of information on the net about the performance differences among the 4 most well known Java Persistence API (JPA) implementations: Toplink Essentials, EclipseLink, Hibernate and OpenJPA.
Besides results and conclusions, the full test code is also available in case you want to repeat the test yourself.
I wrote a relatively simple program which executes some queries and inserts in a MySQL database through JPA. Four fixed-time tests were done with exactly the same code, just changing the JPA implementation library and the persistence.xml. I monitored the resources used by the JVM and counted the inserts and queries executed during the tests. Finally, I show here my conclusions and also the results of these tests, so that you can draw your own. I consider the differences found among the different implementations truly relevant.
For the tests performed for this article, nothing except JPA was used. No web pages, no web or application server. Just java threads, JPA and MySQL. I give more details in the next sections.
Note: In case you are using JPA with Axis and/or the Google Web Toolkit (GWT), this other article focused on working with JPA, Axis and GWT could be of interest for you.

Description of hardware and software
The tests have been done in an Acer Extensa 5620G laptop, with a pentium Core 2 Duo T5250 Processor with 2 Gb Ram DDR2, being monitored by a standard PC.
For the tests I have used the following software:
  • Ubuntu 8.10 Intrepid Ibex
  • MySQL database, version 5.0 (installed from the official Ubuntu repositories).
  • Java Virtual Machine 1.6
  • Driver jdbc for MySQL 5.1.
  • Eclipse Ganymede
  • The employees database example for MySQL, courtesy of Patrick Crews and Giuseppe Maxia (url below in the references section)
  • JConsole for resources monitoring
  • GIMP 2 to capture screens
The database and the JVM were running in the Acer machine. But both JConsole and GIMP were executed in a PC (also equiped with Ubuntu 8.10) connected via tcp/ip to the test machine. I did it so that I did not overload the machine running the tests.
Versions of the JPA implementations tested:
  • Hibernate EntityManager and Annotations 3.4.0
  • Toplink Essentials version 2 build 41
  • Openjpa 1.2.0
  • Eclipselink 1.0.2
Description of code and tests
The code developed for the tests is available to download here. All you have to do is import the zip file in Eclipse. You will need at least one of the JPA implementation libraries. You can download them from the urls in the references section below.
The code is made up of two type of threads, one for inserting and one for querying. each of them containing a loop.
Inserting thread loop gets an arbitrary employee and makes a copy of him/her, letting MySQL generate a new emp_no. This was the only modification I did to the employees database: the emp_no is auto-generated.
Querying thread loop executes these queries in sequence:
  • A query returning the number of female employees.
  • A query returning the number of male employees.
  • A query returning all employees hired since an arbitray date.
  • A query returning all employees born after an arbitrary date.
  • A query returning all women who have earned more than an arbitrary salary.
I have also created an independent class JPAManager, which is in charge of creating the static EntityManagerFactory and the EntityManager for each of the threads. You have the details of that class in this other article focused on the problems derived from sharing EntityManager among different objects.This is the starting sequence:
  1. When the program starts, it waits 2 minutes for the monitoring infraestructure to be ready (connecting JConsole to the JVM, basically).
  2. It then starts 2 of the so-called inserting threads. I start the inserting threads before the querying threads trying that the queries do not always return the same (which will eventually happen, anyway).
  3. After starting the inserting threads, the program starts running 18 of the querying threads, inserting a pause of 10 seconds before starting next. This is so that they do not execute the same query at the same time.
  4. The program runs the threads for 30 minutes. After that time, it sends a stop signal to the threads, which will safely make them stop after the next inserting or querying round. The main program waits 15 minutes for the threads to stop and the jvm memory to stabilize.
  5. Before stopping, the threads provide information about the number of inserts/queries they have executed.
The only change from test to test was the JPA implementation library and the persistence.xml. It is important to notice that the persistence.xml was left by default for each of the implementations, omitting on purpose any kind optimization that the implementation could accept.

Before every test, the inserted records were deleted. In this way, every implementation started with the database exactly in the same situation.

Results
These were the results of the tests per JPA implementation library. Notice that the time was fixed: 30 minutes running.

Number of queries+inserts executedNumber of queries executedNumber of inserts executedMax mem occupied during the test(Mb)Mem occupied after the test(Mb)
OpenJPA3928
353039896
61
Hibernate12687
3080960713079
Toplink Essentials
5720
374019805525
Eclipselink
5874
373521395725

The maximum memory occupied is the maximum amount that the JVM reserved during the test.
The memory occupied after the test is the amount of memory that remained reserved after finishing the test.
I have emphasized the highest and lowest values for each of the columns.
You can see this graphically in the following images showing the data monitored during the different tests.

OpenJPA monitoring data

Hibernate monitoring data

Toplink Essentials monitoring data

Eclipselink monitoring data

Conclusions

My intention is that anyone can draw their own conclusions looking at the results or using the code to do a test of their own.
Nevertheless, I consider that there are a number of conclusions that one can draw watching the monitored data:
  1. There is not an implementation that clearly has the best performance. Some had a very good CPU or memory performance and some did it very well when inserting or querying. But none of them was outstanding as a whole.
  2. The number of records inserted by Hibernate was extremely higher than it was for any other implementation (4 times more compared to Eclipselink and 24 times more compared to OpenJPA). However, Hibernate was also the JPA implementation that executed the lowest number of queries, although the differences in this value (3080 for Hibernate vs 3740 for Toplink Essentials) are not so extreme as for the number of inserts.
  3. Hibernate was also the implementation that consumed more memory. But having into account that it inserted many more records than the others, it sounds reasonable.
  4. OpenJPA had the lowest value of inserts+queries.
  5. The number of inserts executed by OpenJPA was extremely low, compared to the others.
  6. The usage of CPU in the case of Toplink Essentials and Eclipselink was extremely low.
Note for the JPA implementations responsible/developers: I am aware that some optimization can be obtained by changing the persistence.xml and/or changing the code somehow. If you give me some advice on how to improve the performance of any of the implementations, I will be glad to update this post with that information.

References
Ubuntu: http://www.ubuntu.com/
Employees database: http://dev.mysql.com/doc/employee/en/employee.html, https://launchpad.net/test-db/
Openjpa: http://openjpa.apache.org/
Toplink Essentials: http://www.oracle.com/technology/products/ias/toplink/jpa/download.html
Hibernate JPA: http://www.hibernate.org/397.html
Eclipselink: http://www.eclipse.org/eclipselink/
MySQL: http://www.mysql.com/
Eclipse: http://www.eclipse.org/

47 comentarios:

Anónimo dijo...

There is a mistake. OpenJPA had the lowest count of queries+inserts, not toplink.

Unknown dijo...

Uuuups, you are right. I have just corrected it.

Anónimo dijo...

It would be nice to know some numbers without going into the code:
size of tables, how many joins were necessary for each select, how many entities were retrieved in each query.

Unknown dijo...

Well, the number of entities retrieved changed from select to select because all of them had an arbitrary parameter.
The longest number of records in a table was for the salary table, which had almost 3 million records. The employees table had about 300.000.
And about the joins, I think that someone interested in that level of detail really should go into the code.
Anyway, I did not want to make the post any longer. In fact, it is already a bit longer than I had thought up at the beginning.

Manuel Dominguez Sarmiento dijo...

Hibernate's insert performance might have been affected by configuration. Batch updates/inserts make a world of difference.

All these tools are very configurable (especially Hibernate) so these benchmarks don't mean much as there are so many variables involved. Perhaps they are valid "out-of-the-box" comparisons, however, if there is something that we know about enterprise systems is that out-of-the-box rarely means anything significant. Any sizable app will need significant tuning.

So, perhaps performance is not really a concern for choosing JPA vendors. I think it's more important to focus on stability, community, support and developer productivity above all. Hibernate has an edge over all other tools because of its huge community. That's hard to beat.

But, that doesn't mean that things can't change. If we look back, Struts was the defacto standard for MVC not too long. Who's using Struts for new projects? I would dare to say that very few teams even evaluate Struts as a valid option at this point. The same could happen to Hibernate. But so far it seems unlikely, at least for the near future.

Anónimo dijo...

Is it the case that hibernate had the lowest number of queries because it had the biggest data to query from?

Unknown dijo...

Manuel, you are right, the tests were done with the products "out-of-the-box".
The thing is that if you start changing the configuration, anyone could say that "your" configuration is not optimum. And I think that the fairest (the only, actually) way of comparing is without changes in the "by default" configuration.

Anónimo, no. After every test the database was returned to the initial state by deleting the inserted records. So for every test the number of records to query from was exactly the same.

Anónimo dijo...

Hi! I found very interesting this test. This can be updated or improved by testing a n number of inserts, and comparing which one consume more memory/CPU and comparing the time that the test took.

Well, this is just an idea.

palheta dijo...

Can you add jpox/datanucleus to the test? http://www.datanucleus.org/products/accessplatform/

I'd like to see how well does it fare out-of-the-box

Unknown dijo...

palheta, I'll try to find the time to add jpox/datanucleus to the comparison and I'll update the post with the results.

Anónimo dijo...

Any chance you'd be willing to redo the tests using DataNucleus? DataNucelus implements JPA in addition to JDO 2. I'd like to see the comparison to see if compile-time bytecode enhancement makes any difference in performance. Thank you!!

Anónimo dijo...

Very good comparation.

Can you do comparation it with iBatis also?

Regards

Anónimo dijo...

again on the DB: most DBs will vary the performance upon a lot of insert/update/delete of tuples
this is because the state of the internal data structures chenges, even if the actual data is reverted to the original by deleting all the new rows
However it's hard to tell if and how much this could affect the performances.

Unknown dijo...

I think that you have to take into account the initial number of records for the employees and salaries tables (the ones used for the tests), which had more that 300.000 and almost 3 million respectively. With that numbers, I would say that 2000 inserts and subsequent deletes should not make too much difference on the database performance.

Unknown dijo...

Please correct me if I am wrong, but I think iBatis is not a JPA implementation, but an independent solution. That means that to do a test I would have to change the code, not only the library. And the results of such a test could not be easily compared to those here.

Anónimo dijo...

Well,

Another big advantage of using hibernate in your jpa code is that you dont have to configur a weaver on your jdk. I know thats a one time effort but still it comes with hastle...

Greetz
Leo

Unknown dijo...

How about cayenne. JPA is beta but this framework does rock.

Unknown dijo...

I am surprised by OpenJPA performance. In my private test it 'kicked ass' on Hibernate. I would bet you did not use agent for bytecode waving or Ant postcompilation task. It makes OpenJPA slower on inserts..

andy dijo...

> Another big advantage [...] is that you dont have to configure a weaver on your jdk

Do please tell us where is the "big" advantage in that? There are Ant tasks, Maven plugins, Eclipse plugins, Netbeans build.xml tasks to do that as a post-compile step automatically after compilation, even the lazy developer can cope with that. Bytecode enhancement is an *advantage* over reflection in many areas, particularly speed in detection of changes to fields/properties; this is amply demonstrated by many other benchmarks.

Unknown dijo...

Jan, as stated on the article, I have not changed the by default configuration for any of the implementations. You can check in the code the persistence.xml used for each of them. There's nothing but the JPA provider and the database connection data.
But you have to be aware that this is valid for all of them. Hibernate, for example, is known for being quite conservative in its default configuration.

Anónimo dijo...

> I have not changed the by default configuration for any of the implementations

So you don't enhance the model classes before running the test with OpenJPA? Consequently the test results are worthless for that implementation then
http://openjpa.apache.org/builds/1.2.0/apache-openjpa-1.2.0/docs/manual/manual.html#ref_guide_pc_enhance

Edwin Biemond dijo...

Hi,

Maybe you can use the jrocket jvm next time, this is a bit slower but it is better to compare the different jpa implementation.

Anónimo dijo...

Maybe you can repeat this test with enhanced classes? It would be interesting to see. It is not hard, just use -javaagent:jpaimpl.jar VM argument.

Unknown dijo...

Ok, Jan, I'll try to repeat the test this evening with the enhance activated.

César dijo...

Very interesting for Java professionals. You could publish something about Oracle for Oracle professionals, don´t you?

Anónimo dijo...

Here is the DataNucleus results in comparison to Hibernate:

DataNucleus:

Insert 7119
Queries 14245

Hibernate:

Insert 7637
Queries 3100


Tests run on derby. DataNucleus configuration is the default. Using runtime enhancement via javaagent


Hibernate config:

property name="hibernate.connection.driver_class" value="org.apache.derby.jdbc.ClientDriver"
property name="hibernate.connection.url" value="jdbc:derby://localhost:1527/TestDBHibernate;create=true;create=true"
property name="hibernate.hbm2ddl.auto" value="create"


DataNucleus config :

property name="javax.jdo.option.ConnectionDriverName" value="org.apache.derby.jdbc.ClientDriver"
property name="javax.jdo.option.ConnectionURL" value="jdbc:derby://localhost:1527/TestDB;create=true;create=true"
property name="datanucleus.autoCreateSchema" value="true"

Andi dijo...

I wrote about your comparison here http://www.ithighlight.com/2009/01/comparison-of-hibernate-toplink-openjpa-and-eclipselink/

Loïc dijo...

"I am surprised by OpenJPA performance. In my private test it 'kicked ass' on Hibernate. I would bet you did not use agent for bytecode waving or Ant postcompilation task. It makes OpenJPA slower on inserts.."

I'm suprised too. Do you have plan to test a Open JPA enhanced code?

Unknown dijo...

Yes. I will try to test OpenJPA with enhanced runtime as soon as I find the time. Datanucleus is also on the list.

Anónimo dijo...

Since you didn't include DataNucleus in your comparison, here it is (using out-of-box configuration, derby as database):

DataNucleus run #1:
Inserts: 31600
Queries: 26635

DataNucleus run #2:
Inserts: 49150
Queries: 15705

Hibernate run #1:
Inserts: 15990
Queries: 15875

Hibernate run #2:
Inserts: 16050
Queries: 17320

total number of operations
DataNucleus: 123.090
Hibernate : 65.235

DataNucleus is using javaagent for runtime enhancement, which reduces drastically DataNucleus performance. Still, DataNucleus proves being twice faster than Hibernate.

Unknown dijo...

Anónimo, you do not provide details about the database and whether you used my code or not.
Because you have used Derby, I give for granted that you did not use the Employees database, which had 1.800.000 records in the salaries table.
What kind of database did you use? Number of records? Code?

John Yeary dijo...

Thanks for doing this comparison. I was contemplating trying a couple of the JPA implementations to see if there were any performance differences. This has saved me a lot of work.

Unknown dijo...

Thanks to you, John, for leaving your message. I just checked your blog, which I found rather interesting. I've already added it to my google reader.

James dijo...

In looking at your code, none of your queries actually return an objects. They are all count queries, returning a single number. Hardly a typical usage of object-relational mappings, makes the comparison not very useful.

Anónimo dijo...

What JDK were you using in these tests?

Unknown dijo...

1.6. It is written in the description of the test.

Anónimo dijo...

What vendor? I assume Sun, but that wasn't stated.

Unknown dijo...

Yes, Sun. Regards.

Anónimo dijo...

I did tests last year on different databases with a modified version of the PolePosition bench.
I compared Postgre 8.3, MySQL 5.1, Derby, H2 and HSQLDB with both pure JDBC and Hibernate 2 (not 3). I tried to set the settings for "real world" use (i.e sufficient RAM for all DBMS, use of logs and no in memory databases). Derby was by far the slowest databases and scales very badly, which, in my opinion, disqualifies it for making benchmarks. The fastest databases were consistently HSQLDB and H2 in embedded mode (only one client). They could outperform the other RDBMS quite significantly. Oracle XE, MySQL and PostgresSQL peformances were tied, with an edge for Oracle in updates, an edge for MySQL in query, and an overall good performance of PostgreSQL.
As for Hibernate 2 (not 3), it didn't perform very well when compared to pure JDBC, even with a cache. Even though I haven't tested it yet, I expect iBatis to be much faster, because it is quite close to JDBC. In this regard, iBatis is in my opinion a very interesting alternative to pure JDBC.

Nicolas J.

Anónimo dijo...

Anyone reading this should read the very interesting comment by John Stecher here before jumping to conclusions:
http://www.theserverside.com/news/thread.tss?thread_id=53142

Unknown dijo...

Sure. I also recommend to read my answer to John's post.

Henrique Sousa dijo...

Could it be that Hibernate had a lower amount of queries due to the much higher number of rows inserted (more information to retrieve at querying)?

Unknown dijo...

Well, that could be a potential cause. But given the number of already existing records, I would not say that that could be the reason.

Anónimo dijo...

Interesting blog you got here. I'd like to read more about this matter. Thanks for sharing this material.
Sexy Lady
Female escorts

Vinicius Gatto dijo...

Nice job!!
BUT...this results are related with a MySQL database.

And with other ones? Maybe some implamentation is good with one database, and bad with others.

Don't you thing?

Regards,
Vinicius Gatto

Unknown dijo...

Well, it could be. But repeating the tests with different databases would make the study exponentially larger. And I am not really sure that changing the database would make any difference anyway.
Thanks in any case for reading and for your proposal.

Marco dijo...

Fantastic work!

Can you include DataNucleus in a future test?