Hibernate Search & JPA Dev Blog

Mittwoch, 1. Juli 2015

Introduction to Hibernate Search with any JPA implementor (in this example: EclipseLink)

Hibernate Search currently is only associated with Hibernate ORM/OGM. For many people this comes at no surprise as it already has “Hibernate” in its name. However during this year’s Google Summer of Code I am working on an integration of Hibernate Search that aims to be able to be used with any JPA implementor (so far Hibernate, EclipseLink and OpenJPA are supported).

In the following article I will show you how you can leverage the power of Hibernate Search even though you chose not to use a Hibernate based persistence provider in your application by using the new Hibernate Search GenericJPA project.

The example project

Imagine a Book store (ok, a really simple one). Information about their books are stored in their MySQL database and accessed via EclipseLink. A sample persistence.xml and Maven pom.xml for that would look like this:

With the Book entity looking like this:

Now, the manager of the store wants to be able to search for a specific book in the database to help the customers find what they want. This is where a normal database like MySQL gets into trouble. Yes, you can do queries like:

SELECT b FROM BOOK b WHERE b.name like ‘%Hobbit%’;

But these kind of queries don’t have all the power a fully fletched search-engine has. In such an engine you have the power to decide over how the name would be indexed. For example, if you want to have fuzzy queries that still return the right book even if you entered “Wobbit” a normal RDBMS is not sufficient. This is where the power of Hibernate Search comes in. Under the hood it uses Lucene, a powerful search-engine and improves it by adding features like clustering and support for mappings of Java Beans.

Adding Search to our Book store

Now, let’s take a look at how this is done for our Book store. First, we need to annotate our JPA entity with some extra information.

What did we do?

Class-Level:

added @Indexed annotation

This is needed for Hibernate Search to recognize this entity-class as a Top-Level index class.

added @InIndex annotation

This is a special annotation needed by Hibernate Search GenericJPA on every entity that somehow is found in any index. Just put it there and you’ll be fine

Field-Level:

added @DocumentId/@Field on name

Hibernate Search needs to know which field is used to identify this entity in the index. this produces a field with the name “id” in the index. We also want to store the name into a field called “name”.

added @Field on author

apart from searching for the name we also want to be able to search for the author of the book. this is stored into the field called “author” in the index.

Starting up the engine

As Hibernate Search GenericJPA is not integrated into Hibernate ORM we have to manually start everything up. But this is not hard, at all:

You may have noticed loading a properties file in this snippet. This is the configuration needed for Hibernate Search. Let’s take a look at it next.

Now we could start using hibernate search for queries.

But wait, why didn’t we receive any Books even though our query was right? Well, that’s because we forgot to add some additional classes to our persistence.xml. In order to keep the index up-to-date we have to specify special entities that can hold the information about changes in the database. These will be queried by our Hibernate Search GenericJPA engine and then the index is updated accordingly. For our example, we need this special entity:

It just has to be in our persistence.xml and the engine will automatically recognize it:

(Note that every table that is related to any entity in the indexed graph has to be mapped like this. But for the sake of keeping it simple, we didn’t include any mapping tables in this example.)

That’s it. Now we should be able to query our index properly and leverage Hibernate Search’s capabilities.

What’s next?

This example is quite simple as it doesn’t make use of Hibernate Search’s possibilities to index a complete hierarchy with many different entities in the graph. Examples on how to do that can be found in the Hibernate Search documentation (http://docs.jboss.org/hibernate/search/5.3/reference/en-US/html_single/). The only thing to keep in mind is that the *Updates entities will have to be created even for the mapping tables.

Freitag, 26. Juni 2015

Easy startup of Hibernate Search GenericJPA

I've been talking about it in the last posts and it finally is done. I streamlined the whole startup process to not require extending a specific class. Before I go into detail on how this is done, let's take a look into what dependencies are needed to use Hibernate Search GenericJPA:

The "hibernate-serach-jpa" module contains all the code needed to enable Hibernate Search in your JPA based application. The "hibernate-search-ejb" module contains the additional startup logic for Java EE containers (currently only default named EMF and properties @/META-INF/hsearch.properties are supported in this module. If you need different behaviour, you can just startup your Search module like in the Java SE case).

Starting up the Hibernate Search GenericJPA

Now, let's take a look into how we start this whole thing up in a Java SE environment:

If you are using the EJB module in a Java EE context, the startup is already done for you and you can just inject it like this:

Configuration

No matter where your properties come from (manually specified or loaded from the EJB module), you will have to define some settings:

For *.useJTATransactions=true the JTA transaction manager (by default) is looked up via JNDI and you have to specify some extra parameters:

This list might change in the future and I might have forgotten some properties in this posting. For a complete overview, you can always take a look at the Constants class

Donnerstag, 4. Juni 2015

Integration Tests with Arquillian & Case Sensitive MySQL

Google Summer of Code has officially started and I've been pretty busy working on some important things.

Integration Tests

The goal of my project is to support as many JPA providers as possible and that can only be done by constantly testing the current version against every JPA provider that we want to support. That's the reason why I spent the last few weeks implementing integration tests for Hibernate, EclipseLink and OpenJPA using Arquillian.

By doing so, I found several small bugs in the code that I would have missed otherwhise as I've only been testing against EclipseLink in the older unit-tests. Since unwrapping a javax.sql.Connection works different for each JPA provider there is a new method in the SQLJPASearchFactory (JPASearchFactory was split into two interfaces, one that requires a SQL database, one that doesn't) that you have to override for now:

This connection is used to setup the triggers for Update processing. I updated the gists from earlier in this blog to reflect these changes.

Case Sensitive MySQL

I must confess, I have only tested the API on two similar setups until early this week. Both are Windows machines; one with MariaDB, one with MySQL Community Edition.

Then, a build on a linux box was tested and it couldn't find the tables needed in the unit-tests. Why? Because I haven't added @Table to every entity and the @Updates annotation requires you to write the table name.So the JPA providers ended up generating a different table name (concerning upper and lower case) than the one in the @Updates annotation.

I haven't noticed that on my Windows machine, since by default WIN-MySQL is case insensitive.

These kind of problems in the build can't happen anymore from now on since the project is now being periodically built by CI @ ci.hibernate.org.

Mittwoch, 13. Mai 2015

Basic Architecture & Goals

In this post I want to explain the basic structure of the project and what my plans are in the near future:

Important (non-self-explanatory parts) from left to right:

Hibernate Search Engine: The base of any Hibernate Search implementation. This contains all the classes necessary to control a Lucene Index the same way as Hibernate Search ORM does.

Hibernate Search Standalone: A standalone implementatation of a SearchIntegrator that can be used to use Hibernate Search's engine without any restrictions about where entities come from.

Hibernate Search Database Utilities: These utilities are used to create the necessary triggers on the database to keep track of Entity changes.

The "main" module - Hibernate Search GenericJPA: This module combines the Database utilities and the standalone Hibernate Search implementation to work together with a JPA provider. It accumulates all the information needed to create the event triggers in the database and sets them up. Then, it also starts a polling service that reads the information about changes in the database and it orders the underlying standalone implementation to update the indexes accordingly.

In order to make the usage of this module as easy as possible, the Hibernate Search GenericJPA module has implementations for nearly all the interfaces in org.hibernate.search.jpa. Users shouldn't have to worry about the implementation details or what version of Hibernate Search they are using. Switching Hibernate Search ORM for its generic counterpart (my
project) should be as easy as switching your JPA provider.

Plans for the project in the near future

1. improve the Update system

As described in my earlier post, currently the user has to create specific Entities that map to the tables that store the update information.

These serve two purposes:

To create the actual table and
give the Index-Updating service access to the database via JPA.

This is unnecessary boiler-plate code input that is needed from the user in order to get the whole system to work properly. A new system that serves the same purposes has to be designed, while not forcing the user to provide boiler-plate code.

2. implement more TriggerSources

since Triggers are not part of the SQL specification, every RDBMS handles them a bit different. That's the reason the Database utilities contain a interface called TriggerSQLStringSource that hosts several methods that generate the SQL code needed to create the triggers and set up the database for the Update system. As of now, only a MySQLTriggerSQLStringSource exists. This has to change.

3. easier startup

as of now the user has to start the SearchFactory on his own after he has provided it with the configuration data. While this might be practicable for small projects, this could be a problem for bigger systems in a Enterprise Environment. That's why we have to think of a way to hook into the JPA startup process.

[Current state] How to use the current Version of Hibernate-Search Generic JPA

As promised I am following up with the current state of development in this blog entry.

Firstly, let's take a look at what standard Hibernate Search annotated entities would look like:
First our Game Entity:

And the Vendor Entity:

To get this to work properly in the current state of my Generic JPA version an additional @InIndex is needed at the class-level:

This is a requirement when working with the general JPA specification as some implementors create subclasses for each Entity and Hibernate-Search needs to know which class in the hierarchy was the original one that was supposed to be in the Index.

At the moment we configure our Hibernate Search instance by subclassing JPASearchFactory. While this might be a bit unconvenient for production use it gets the job done easily during development. Later on this will be possible to be configured via a properties/xml file.

Note that this is an example for a SearchFactory in a Java EE application:

The only thing that is needed to get this whole thing to work are the Updates classes. As of now the user has to create these on his own. This will hopefully not be a requirement in the future, anymore.

Note that the user not only has to create a table for each Entity, but also one for each mapping table as well. For an existing database the user will have to index a manually for now (However: All data that is persisted AFTER the setup was done will be updated automatically):

Now that we have set all this up correctly, we can search just like with normal Hibernate Search ORM:

This is it for now. Stay tuned for further updates!

Happy Coding,
Martin

Mittwoch, 29. April 2015

This is me / Project introduction

Hi,

my name is Martin Braun. I am currently in the last semester of my Bachelors Degree in Computer Science at the University of Bayreuth.

I've been around the Hibernate Search project for a while now. Mainly asking questions on how to solve my problems (I had several :D) and now I want to give something back during this year's Google Summer of Code.

The Project

Hibernate Search is an awesome library when you have a JPA based application and you want to add fully fletched fulltext search capabilities to your domain model. You simply add annotations to the fields you want to index and then you can generate a working Index from the JPA objects. When the database changes, the index is updated accordingly. This works just fine (TM).

Here is an example from the Hibernate Search getting started page

@Entity
@Indexed
public class Book {

  @Id
  @GeneratedValue
  private Integer id;

  @Field(index=Index.YES, analyze=Analyze.YES, store=Store.NO)
  private String title;

  @Field(index=Index.YES, analyze=Analyze.YES, store=Store.NO)
  private String subtitle;

  @Field(index=Index.YES, analyze=Analyze.NO, store=Store.YES)
  @DateBridge(resolution=Resolution.DAY)
  private Date publicationDate;

  @IndexedEmbedded
  @ManyToMany
  private Set<Author> authors = new HashSet<Author>();
  public Book() {
  }

  // standard getters/setters follow here
  ...
}

One of the few problems it has, is that once you decide to use Hibernate Search you have to use/stick with Hibernate ORM and lose the possibility to swap the JPA provider for something along the lines of EclipseLink (switching the JPA provider - in my eyes - is one of the big benefits of using JPA), i.e. because your (new) Jave EE Container ships with it and you don't want to change it. This is due to Hibernate Search relying on Hibernate ORM specific events to update its index. These are by far more sophisticated than the ones plain JPA provides and while other JPA providers might have similar features, there is no clear specification for these.

The current problem

The goal of my Google Summer of Code project is to fix this and provide an integration of Hibernate Search's Engine that works with (most) JPA providers (and for now only SQL databases). It can definitely be done (I already have proven that in some Unit tests in my project repository) and I hope that after this years Google Summer of Code is finished my integration is mostly complete for people to use.

What I aim for

This is it for now, I will be following up this blog post with more details on this integration is going to work in the near future.

Happy Coding!

Martin

Dienstag, 28. April 2015

Welcome to my Development Blog...

...about making Hibernate Search work with any JPA implementor. This will be the space where I write about new developments and my participation in the Google Summer of Code program.