A couple of days ago, I decided to add a search feature to this blog. Why? Because I wanted to find some post about PHP, but wasn’t at my computer to search through the website files manually. I could have used Google and added
site: markusdosch.com to the query, but I think it is nicer when a website provides search on its own.
The challenge: This blog is just a bunch of HTML files, generated with the static site generator Hugo on every Git push, hosted for free on Netlify. So, there is no backend server part or API that could process a search query and return the results to the client.
Netlify does offer “Functions” for this kind of use case. “Functions” is code that runs on the backend, but Netlify abstracts away the server, so you don’t have to manage one yourself1. But I did not want to depend on Netlify too much - I like that my blog is just static files, and I could easily self-host it, or use a similar service like GitHub Pages or Vercel. Hosting static files is very easy and basically free - hosting a backend is not.
There also exist hosted search services like Algolia, that index your web page and offer an API that your frontend can call to query for results. But, same problem: You make yourself dependent on a third-party.
Solution (where we don’t depend on a third-party?)
We can do the search on the user’s browser! 🤩💻
Things to consider
Downside: Now, when users interact with the search, they have to download the full search index. For large sites, this can be several Megabytes. For small sites like this, I think this is okay. And, as the search index is just a static file, your web server will instruct the user’s browser to cache it - so at least, the user downloads this file has just once (until a next release gets published).
Advantage: For me, the website owner, the advantage is obivous - I can stick to static file hosting, and “offload” the search to the user’s browser. But there is an advantage for the user, too. The search becomes much faster, as there are no additional client-sever roundtrips after the search index has been downloaded. Everything happens client-side, so we can easily e.g. update the search results as the users is typing, without having to worry about server response times.
For actual implementation, I followed the solution outlined by Wladimir Palant in his post"The easier way to use lunr search with Hugo". We generate a JSON file search index with Hugo’s templating functionality, and use the Lunr.js library for client-side searching. The intgration was quite straight-forward, the only diffulty for me was to configure Hugo to output the search index in JSON format.
I hope this post has successfully given an overview on how full-text serches can be implemented with smallish static sites. Happy searching! 🔎
Pagefind is an interesting alternative to the approach outlined here, found via HackerNews. The crucial difference: Pagefind generates not one single large index file, but splits the index into multiple smaller “fragment” files. Depending on the user’s search query, different fragments will be accessed. So, a user needs to download only the parts of the index that are really relevant for their query. Pretty awesome concept, and especially useful for large sites where one single index file would be too big!
- Customizing my shell: From bash to zsh to fish
- JSON5: JSON with comments (and more!)
- jq: Analyzing JSON data on the command line
- Get Total Directory Size via Command Line
- Changing DNS server for Huawei LTE router
- Notes on Job & Career Development
- Generating a random string on Linux & macOS
- Caddy web server: Why use it? How to use it?
- Tailwind CSS: A Primer
- Versicherungen: Learnings