Monthly Archives: april 2014

Elastic kart results

From OLAP to elastic

I found elasticsearch a couple of months ago and decided to give it a try. I plan to use it as a NOSQL schema less document database with full text search for my old kart racing result data. Elasticsearch use a REST api to index and store the JSON documents and will run as a service on my Linux box.
The racing result data comes from my old project for kart race administration and result presentation that I have run for about 10 years. Data was presented on the web the last five years and contains about 60000 result records. The data is stored in a star-shaped OLAP database because I wanted to do some statistics on the data. Well statistics was simple but searching was a nightmare. I also tried to use Lucene for free text searching, but at that time I didn’t manage to get satisfying results so now I will try to do it with elasticsearch.

The star-shaped OLAP database contains a central fact table with the actual results and several dimension tables with data like driver name, event name, result date etc.

OlapStar

Elasticsearch Installation

Linux packages and server set up is still a bit of a mystery to me so this is for us that grew up in Windows land.
This don’t work for me:

wget -qO - http://deb.opera.com/archive.key | sudo apt-key add – dont work

So I have to download the key file separately and install elasticsearch:

 sudo wget -O es.key http://packages.elasticsearch.org/GPG-KEY-elasticsearch
 sudo apt-key add es.key
#Download ES from https://gist.github.com/wingdspur/2026107
 wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.0.1.deb
 sudo dpkg -i elasticsearch-1.0.1.deb
 sudo update-rc.d elasticsearch defaults 95 10
 #Starting Elasticsearch Server
 sudo /etc/init.d/elasticsearch start

You should also download and install some tools like elasticsearch-HQ  or Marvel (installed on the server) because you need some tools to test indexing and searching. I use the Sense plugin for Chrome but it’s now part of Marvel.

Now it was time for take II

After setting up the elasticsearch server on my Linux box I started to experiment with indexing. There are several pages about setting up and trying elasticsearch like Joel Abrahamsson blog post elasticsearch 101 . After trying the ‘hello world’ examples I needed to get some real data into the index. Starting with a nice SQL join where the star-shaped data was converted to ‘flat’ data without relations. The generated data was inserted into the index by a simple Perl script (well simple is a relative concept when I comes to Perl) that read from an exported csv file, converted it to JSON, and sent it to the elasticsearch service via the REST API. I had some problems with the Perl JSON converter because it just have to add quotes around numbers so elasticsearch interpreted them as strings and then term filters don’t work.  After that I decided to generate the JSON by hand. So far so good but the problem was that I have no idea what a document is!

What is a document anyway?

My first assumption, after 20 years in the SQL swamp, was that a record is a document:

 "SKCC 2";"Malmö AK";"2010-05-08 00:00:00";"Träning 1 JUNIOR 60";Training;10;1;"JUNIOR 60";"Mark Hansson";"Jönköping KC";22;998

So after inserting some 2000 records into the index I started to think about how to show the search result. After a while I realized that a record was not a document after all! Trying to relate the documents in a NOSQL database was wrong! Using sub documents also seems unnatural so a document must be something else. After a long midday walk a new structure came up. By thinking bigger aggregates, a document cold be all the results from a race event like:

{
 "EventName":"SKCC 2",
 "EventClub":"Malmö AK",
 "EventDate":"2010-05-08 00:00:00",
 "RaceName":"Slutresultat final",
 "ResultType":"Final sum",
 "ClassName":"JUNIOR 60",
 "sortorder":51,
 "result":[
 {"StartNumber":2,
 "DriverName":"Will Smith",
 "ClubName":"Göteborgs KRC",
 "Position":1,
 "BestLapTime":66.216},
 {"StartNumber":3,
 "DriverName":"Mad Max",
 "ClubName":"Göteborgs KRC",
 "Position":2,
 "BestLapTime":66.431}
 ]
 ….
 }

Now the problem was that all the flat data have merged into a document, in Perl, and with arrays of hashes of arrays of hashes…. Well you get the picture. After some night coding in Perl all documents were ready to be inserted into the index server and it was possible to search the documents again.

Using the Sense plugin for testing a query could look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
POST /kartresultat/raceresult/_search
{
    "query": {
                "query_string": {
                   "query": "andersson"
                   }
    },
    "highlight" : {
        "fields" : {
            "driverName" : {},
            "className" :{}
        }
    }
}

Resulting in:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
"hits": {
 "total": 202,
 "max_score": 0.5282536,
 "hits": [
 {
 "_index": "kartresultat",
 "_type": "raceresult",
 "_id": "lGc3E2H-R0ayK6_VMtwWoA",
 "_score": 0.5282536,
 "_source": {
 "eventName": "SKCC Deltävling 4",
 "eventClub": "Uddevalla KK",
 "eventDate": "2009-06-07",
 "className": "KZ2",
 "races": [
 {
 "raceName": "Slutresultat final",
 "resultType": "Final sum",
 "sortOrder": 51,
 "className": "KZ2",
 "results": [
 {
 "startNumber": 196,
 "driverName": "Viktor Öberg",
 "driverId": "SWE_MTk5MzAyMTUwNTc5",
 "clubName": "Borås MK",
 "position": 1,
 "bestLapTime": 47.952
 },
.....

The web client

First attempt is to apply a driver perspective where you search on a driver name plus class, club, event etc. and get a hit list. With help of the highlight function in the elasticsearch API the full driver name are added to the result data and can be displayed together with the search result.

I will use a Java web application running on JBoss AS with JFS and Richfaces to build the web client. After trying to do the object relation mapping from JSON search result into a POJO by hand (why do I always start down that road? Its 2014 now) I found Google GSON and it work like a charm. Just add the dependecy to the pom file:

<dependency>
    <groupId>com.google.code.gson</groupId>
    <artifactId>gson</artifactId>
    <version>1.7.1</version>
</dependency>

And in code

1
2
Gson gson = new Gson();
EventResult raceResult = gson.fromJson(hit.getSourceAsString(), EventResult.class);

Java client API

The elasticsearch project provide a Java API and the dependency shall be added to the pom file

<dependency>
    <groupId>org.elasticsearch</groupId>
    <artifactId>elasticsearch</artifactId>
    <version>1.0.1</version>
</dependency>

The connection from the application to the elasticsearch server was implemented as an application scoped bean. The bean are injected in search beans and handle client creation and closing. Observe that the Java client connect to port 9300 and not to 9200 that the web based client use.

Client provider bean

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
@ApplicationScoped
public class ElasticSearchClient {
 
   private Client client;
 
   public Client getClient(){
      return client;
   }
 
   @PostConstruct
   public void init() {
      Settings s = ImmutableSettings.settingsBuilder().put("cluster.name", "elasticsearch").build();
      TransportClient tmp = new TransportClient();
      tmp.addTransportAddress(new InetSocketTransportAddress("ubuntu-01", 9300));
      client = tmp;
   }
 
   @PreDestroy
   public void destroy() {
      client.close();
      client = null;
   }
}

Searching

The QueryBuilder are used to set up the search query.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
   public List findRaceResults(String searchString) {
      Gson gson = new Gson();
      List raceResults = new ArrayList();
      try {
         QueryBuilder queryBuilder = QueryBuilders.queryString(searchString).field("_all");
 
         SearchRequestBuilder searchRequestBuilder = clientElasticSearchClient.getClient().prepareSearch(INDEX_NAME);
         searchRequestBuilder.setTypes(RACE_TYPE_NAME);
         searchRequestBuilder.setSearchType(SearchType.DEFAULT);
         searchRequestBuilder.setQuery(queryBuilder);
         searchRequestBuilder.setFrom(0).setSize(20).setExplain(true);
         searchRequestBuilder.addSort("_score", SortOrder.DESC);
         searchRequestBuilder.addHighlightedField("driverName").addHighlightedField("className").addHighlightedField("clubName");
 
         SearchResponse response = searchRequestBuilder.execute().actionGet();
 
         if (response != null) {
            int documentCount = 0;
            for (SearchHit hit : response.getHits()) {
               EventResult raceResult = gson.fromJson(hit.getSourceAsString(), EventResult.class);
               raceResult.setId(hit.getId());
               raceResults.add(raceResult);
 
               documentCount++;
 
               for (String fieldName : hit.getHighlightFields().keySet()) {
                  HighlightField highlightField = hit.getHighlightFields().get(fieldName);
 
                  for (Text hitText : highlightField.getFragments()) {
                     raceResult.addHit(highlightField.getName(), hitText.string());
 
                  }
               }
            }
            log.info("Hits" + documentCount);
            return raceResults;
         }
 
      } catch (IndexMissingException ex){
         log.severe(ex.getMessage());
      }
      return null;
   }

Presentation

Search page

The search result page show the hits with highest score together with the highlight information.

Search

 

Document page

Result page

 

 

Wrap up

It have been very interesting to work with the elasticsearch server and it was far more easy than when I used Lucene only. Next project will be an attempt to make an ASP.NET web client.