ElasticSearch in a simple way

Recently at Tivix I had to implement full-text search functionality. There are many solutions that I could use, but there were a few requirements:

  • use ElasticSearch as main server
  • it should be designed for ElasticSearch
  • it should allow access to more advanced functions in ElasticSearch

Because of these criteria I couldn’t use a library such as Haystack – a common universal library for general purpose full-text search – because it was created to support multiple different engines, so it doesn’t fully leverage all the aspects of ElasticSearch.

But today I got lucky. Recently, developers of ElasticSearch started a promising new project that solved all my problems. It is called ElasicSearch-DSL. This library is simple, without any unnecessary elements. The only problem is that currently it is under heavy development and the library by itself doesn’t support Django. In theory this should eliminate this library from usage, but using a few simple tricks that are already built in to Django, we can get everything running.

To get ElasticSearch running, you need to do four things:

  • Have a connection to an ElasticSearch server
  • Register your models structure in ElasticSearch server
  • Add new objects to server
  • Keep all information updated

To make proper connection to ElasticSearch servers we can use a built-in mechanism of DSL library. It can even remember multiple servers by itself. We just need to add a new line to our settings.py module with information about where the server is. Here is some sample code:

from elasticsearch_dsl.connections import connections

connections.create_connection(hosts=['localhost'], timeout=20)

Because the ElasticSearch engine can detect the structure of objects by itself, for now we can ignore second point from the list. It is only useful when we are using nested structures with types that look similar but behave differently than primary ones.

Let’s take care of the third fourth points from the list. In Django, all models have to emit signals with information that an object has been created or modified. It is called the post_save signal. Only problem is that we need to distinguish between creation and modification of records, but this information is provided by an argument called created.

Here is a simple structure of what we want to achieve:

def save_object_to_elastic_search(instance, created, **kwargs):
    if created:
        # Create new record in ElasticSearch
    else:
        # Update old record in ElasticSearch

from django.db.models import signals
signals.post_save.connect(save_object_to_elastic_search, sender=ModelToSave)

Now we need to prepare data that will be exported to ElasticSearch. In this case we are using the easiest approach by just serializing the data using the built-in Django method called model_to_dict. Its main purpose is to change object structures that can’t be parsed by ElasticSearch to a more easily-serialized format. We also need to prepare some other information, like doc_type. In our case we can use the application name connected with the model name and object id.

from django.forms.models import model_to_dict

def save_object_to_elastic_search(instance, created, **kwargs):
    doc_type = str(instance._meta)
    id = instance.id
    data = model_to_dict(instance)
    
    if created:
        # Create new record in ElasticSearch
    else:
        # Update old record in ElasticSearch

Last step is to provide all the arguments for a proper save. Both methods index and update have pretty similar sets of arguments, with one small difference in the body argument. So the final code looks like this:

from django.forms.models import model_to_dict

def save_object_to_elastic_search(instance, created, **kwargs):
    doc_type = str(instance._meta)
    id = instance.id
    data = model_to_dict(instance)
    
    if created:
        connections.get_connection().index(
            doc_type=doc_type,
            id=id,
            body=data,
        )
    else:
        connections.get_connection().update(
            doc_type=doc_type,
            id=id,
            body=dict(
                doc=data,
            ),
        )

Of course this is a very simple example of how to properly register all Django models in ElasticSearch for proper usage. There are still a lot of things missing here, for example:

  • removing objects
  • nested structures
  • bulk index and update already existing objects in database
  • and many more…

But this short article should help you with pairing ElasticSearch-DSL with a Django project.