Adding full-text search to a static site (= no backend needed)

May 24, 2022 19:45 · 760 words · 4 minute read

A couple of days ago, I decided to add a search feature to this blog. Why? Because I wanted to find some post about PHP, but wasn’t at my computer to search through the website files manually. I could have used Google and added site: markusdosch.com to the query, but I think it is nicer when a website provides search on its own.

The challenge: This blog is just a bunch of HTML files, generated with the static site generator Hugo on every Git push, hosted for free on Netlify. So, there is no backend server part or API that could process a search query and return the results to the client.

Netlify does offer “Functions” for this kind of use case. “Functions” is code that runs on the backend, but Netlify abstracts away the server, so you don’t have to manage one yourself¹. But I did not want to depend on Netlify too much - I like that my blog is just static files, and I could easily self-host it, or use a similar service like GitHub Pages or Vercel. Hosting static files is very easy and basically free - hosting a backend is not.

There also exist hosted search services like Algolia, that index your web page and offer an API that your frontend can call to query for results. But, same problem: You make yourself dependent on a third-party.

Solution (where we don’t depend on a third-party?)

We can do the search on the user’s browser! 🤩💻

How does this work? On every release, we generate a search index, a file that contains every full text of every post written. Then, when a user wants to use the seatch, the user’s browser downloads this file, and searches through this index via client-side JavaScript. Then, the browser replaces the existing page content with the search results.

Things to consider

Downside: Now, when users interact with the search, they have to download the full search index. For large sites, this can be several Megabytes. For small sites like this, I think this is okay. And, as the search index is just a static file, your web server will instruct the user’s browser to cache it - so at least, the user downloads this file has just once (until a next release gets published).

Advantage: For me, the website owner, the advantage is obivous - I can stick to static file hosting, and “offload” the search to the user’s browser. But there is an advantage for the user, too. The search becomes much faster, as there are no additional client-sever roundtrips after the search index has been downloaded. Everything happens client-side, so we can easily e.g. update the search results as the users is typing, without having to worry about server response times.

Implementation

For actual implementation, I followed the solution outlined by Wladimir Palant in his post"The easier way to use lunr search with Hugo". We generate a JSON file search index with Hugo’s templating functionality, and use the Lunr.js library for client-side searching. The intgration was quite straight-forward, the only diffulty for me was to configure Hugo to output the search index in JSON format.

Future improvements

I got the search live quite fast, but there are two things I’d like to improve sometime. First, I’d like to have search results as you type. As I said, this should not be too hard, as the search is already fully happening in the browser. Second, I’d like to load the Lunr.js JavaScript library not on every page load, but only when a user starts interacting with the search input. Lunr is just a couple of KBs, and the file is cached, too, but still, as most visitors won’t use the search, I’d them rather not have to download this dependency.

I hope this post has successfully given an overview on how full-text serches can be implemented with smallish static sites. Happy searching! 🔎

UPDATE 2022-08-03

Pagefind is an interesting alternative to the approach outlined here, found via HackerNews. The crucial difference: Pagefind generates not one single large index file, but splits the index into multiple smaller “fragment” files. Depending on the user’s search query, different fragments will be accessed. So, a user needs to download only the parts of the index that are really relevant for their query. Pretty awesome concept, and especially useful for large sites where one single index file would be too big!

The general for this concept is “Serverless computing”. Other popular serverless function providers are AWS Lambda, Google Cloud Functions, and Vercel Serverless Functions. ↩︎

Solution (where we don’t depend on a third-party?)

Things to consider

Implementation

Future improvements

UPDATE 2022-08-03

Read more