openMDM® 5 Full-Text Search – Technology Proposal

An important function of openMDM® 5 is the full-text search. It allows test engineers to find measurements based on arbitrary search terms and without knowing the exact location.

When choosing a suitable platform, Peak Solution investigated the two servers SolR and ElasticSearch in detail. This article gives a brief overview on how their features match with the requirements of the openMDM® 5 Eclipse Working Group and what recommendation was given.

The requirements of the openMDM® 5 Eclipse Working Group:

  • Full-Text Search: It must be possible to send single search strings to the platform and receive a list of results. This includes, that the platform has a scoring implementation and supports paging.
  • Query Suggestions (“Did you mean…”) and spell checking: The platform must support query suggestions based on the search term and it must correct spelling errors.
  • (Near) Real-Time Indexing: Nearly indexed items must be directly available in the search (under 10 seconds).
  • RESTful API: The platform must provide a RESTful API to start the full-text search.
  • Plugins for indexing, notification and security: It must be possible to have plugins for indexing and notification to adapt the search to openMDM®. In addition, it should be possible to have user rights taken into account. This can be achieved either by providing a REST API or by providing a plugin mechanism via documented interfaces.
  • Distribution of search servers: It must be possible to have more indexer and search servers running in parallel.
  • Monitoring: It must be possible to monitor the health of the search server.
  • Allowed Eclipse license: The platform must have a license, which is allowed by Eclipse for external libraries.
  • Platform available at the Eclipse Orbit: The platform should be available in the Eclipse Orbit for easier integration.

Comparison:

In order to compare SolR and ElasticSearch, all the requirements above as well as the following two points were taken into account:

  • Intended Use case: This describes where the project is coming from and what the intended use case is.
  • Documentation: This describes whether there is a good documentation of the API.

Performance was not compared, because all benchmarks show similar results and this will not the bottleneck, but the implementation of the indexing respectively the querying of the data elements from the openMDM® API and therefore the ODS Server. For a performance benchmark please visit Mortimer, Tom. Elasticsearch and SolrCloud – a performance comparison [http://de.slideshare.net/charliejuggler/lucene-solrlondonug-meetup28nov2014-solr-es-performance].

For easier understanding, the column “Lucene” has been added to the rating table. This shows you what is not possible with Lucene. As information source the official documentation from ElasticSearch, Solr and Lucene has been used. If nothing has been available there, the link http://solr-vs-elasticsearch.com/ has been used.

ElasticSearch SolR Lucene
Full-Text Search Yes Yes Yes
Suggestions and spell checking Yes Yes Yes
(Near) Real-Time Ind. Yes Yes Yes
RESTful API Yes Yes NO
Plugin-able Yes (jars and REST) Yes (jars in lib folder) Yes (Lucene is a lib)
Distributable Yes Yes (with SolR Cloud) NO
Monitoring Yes (JMX and StatsAPI) Yes (JMX) NO
Allowed license Yes (Apache) Yes (Apache) Yes (Apache)
Available in Orbit No Yes (3.5 – latest: 6.0) Yes (3.5 – latest: 6.0)
Intended Use Case Distributed search Platform with high availability and easy RESTful API. The focus is on Analytics and Search. Fast, open source enterprise search platform. Predefined platform for Lucene. Library to develop custom searches.
Documentation Neat Architecture and easy to start, but not so well documented Consistent and well documented Consistent and well documented

Recommendation:

First, all the base features are well covered by both ElasticSearch and SolR. The use cases, which are handled today, can be implemented with both platforms. This makes the decision less important. Furthermore, both platforms are based on Apache Lucene (lucene.apache.org), which will make a migration easy. This means, the decision is not forever and can be easily revoked in case it has been wrong. Now, ElasticSearch seems to be a better choice for the future. It has been developed with scalability and high availability concerns in mind. This is important, as BigData – in terms of search within measurement data – will be a use case soon.

Author: Christian Weyermann, Peak Solution GmbH

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s