This post is a follow-up to the previous post, which includes instructions for setting up the cluster if you want to try this yourself.
Great search results don’t just rank the right documents—they also show users why a result matches. Vespa can both:
Highlight matches (“bolding”): wrap matching query terms in <hi> tags. Create dynamic snippets: extract fragments around matches to show relevant context, also with <hi> tags for highlights. In Vespa, dynamic snippets are generated from the source field by extracting the most relevant fragments around matching terms.…
In this post we will debug a couple of queries to understand the performance and effects of Vespa’s match-phase. This is a follow-up to the previous post. We assume you have the same setup if you want to run the queries yourself.
If we use a previous query as an example:
vespa query \ 'yql=select title, description from podcast where title contains "Vespa AI Search" or description contains "Vespa AI search"' \ 'hits=10' \ 'ranking=podcast-search' \ 'input.…
This blog post covers how I prepared for my interviews at Spotify and ultimately received an offer for an ML engineer role. For some people, the interview process at FAANG-style companies is as easy as a walk in the park, but for me, it required extensive preparation. The time you’ll need is probably highly individual, but I really, really wanted this job, so I went all in! Note that I won’t be sharing any specific interview questions.…
This blog post is a follow-up to the previous post where we focus more on the ranking function. We will assume you have a Vespa cluster up and running (when writing this, we are using the Docker setup from the previous blog post).
The ranking profile in the schema is the following:
rank-profile podcast-search { inputs { query(q) string } function freshness() { expression: exp(-1 * age(newest_item_pubdate)/(3600*24*7)) + attribute(popularity_score)/9 } # https://docs.…
After playing around with the Podcast index and the amazing SQLite dump, I got stuck and annoyed that the search experience using SQLite and FTS5 is not that nice. I never found a good way to do a fuzzy search for podcasts. After being stuck, I decided that I want to use Vespa instead, which I had played around with before and liked.
I will not go over Vespa but instead focus on how to:…
Long time no sea …
Modal offers great cloud services and one of the best things is the fast start up(and avilability) of GPU:s. I hace historically use some of the hyperscalers out there but it is just so slow and so many steps. Also modal offers 30$ a month for free.
However I have been looking for an option to get a interactive terminal up on a machine with a GPU but not had a good way to do it.…
List of good blogs
https://decodingml.substack.com/ https://research.atspotify.com/ https://www.philschmid.de/ https://huggingface.co/blog https://news.ycombinator.com/ …
TLDR: code can be found here.
In this blog post we will dive down in to how to build a small CLI for sharing files. The goal is to go over how to build a go cli for sharing files. We will set it up so that a shareable link will be created with a set expiration time and the object will be cleaned up after twice that time. For a file we will do the following:…
In order to profile the triton on the GPU we will use NVIDIA Nsight Systems. Installation instructions can be found here. Older version of Nsigh systems can be found here.The first step is to generate a nsys report. This can be done using the following command(you need to continue reading if you want to get all the traces and build NVIDIA triton image):
nsys profile --output /MY_OUTPUT_FOLDER/tmp.nsys-rep tritonserver --model-repository In this blog post we will assume that the GPU is connected to a remote machine that you can run docker or kubernetes on.…
Consistent hashing In this blog post we will dive in to consistent hashing and implement it in go. Lets start with a problem that consistent hashing would help solve. Imagine we have a distributed systems with 3 database. Business is booming and we realize we need to scale out to more shards. Lets assume we selected which node to send the data to using a hash function f(x) and get the node:…