In this example, we will go over how to build a text classifier using a text embedding API. This article is inspired by Fine-tune classifier with ModernBERT in 2025
Large Language Models (LLMs) have become ubiquitous in 2024. However, smaller, specialized models - particularly for classification tasks - remain critical for building efficient and cost-effective AI systems. One key use case is routing user prompts to the most appropriate LLM or selecting optimal few-shot examples, where fast, accurate classification is essential.…
This post is a follow-up to the previous post, which includes instructions for setting up the cluster if you want to try this yourself.
As often happens you first make it work and then you need to make it fast or good or what ever. In our case the prefix search is slow, as in really slow.
The following is the prefix searh discussed here:
vespa query \ 'yql=select id, title from podcast where prefix contains ({maxEditDistance: 4, prefix: true} "All-in")' \ 'hits=5' \ 'timeout=10s' \ 'presentation.…
This post is a follow-up to the previous post, which includes instructions for setting up the cluster if you want to try this yourself.
Great search results don’t just rank the right documents—they also show users why a result matches. Vespa can both:
Highlight matches (“bolding”): wrap matching query terms in <hi> tags. Create dynamic snippets: extract fragments around matches to show relevant context, also with <hi> tags for highlights. In Vespa, dynamic snippets are generated from the source field by extracting the most relevant fragments around matching terms.…
In this post we will debug a couple of queries to understand the performance and effects of Vespa’s match-phase. This is a follow-up to the previous post. We assume you have the same setup if you want to run the queries yourself.
If we use a previous query as an example:
vespa query \ 'yql=select title, description from podcast where title contains "Vespa AI Search" or description contains "Vespa AI search"' \ 'hits=10' \ 'ranking=podcast-search' \ 'input.…
This blog post covers how I prepared for my interviews at Spotify and ultimately received an offer for an ML engineer role. For some people, the interview process at FAANG-style companies is as easy as a walk in the park, but for me, it required extensive preparation. The time you’ll need is probably highly individual, but I really, really wanted this job, so I went all in! Note that I won’t be sharing any specific interview questions.…
This blog post is a follow-up to the previous post where we focus more on the ranking function. We will assume you have a Vespa cluster up and running (when writing this, we are using the Docker setup from the previous blog post).
The ranking profile in the schema is the following:
rank-profile podcast-search { inputs { query(q) string } function freshness() { expression: exp(-1 * age(newest_item_pubdate)/(3600*24*7)) + attribute(popularity_score)/9 } # https://docs.…
After playing around with the Podcast index and the amazing SQLite dump, I got stuck and annoyed that the search experience using SQLite and FTS5 is not that nice. I never found a good way to do a fuzzy search for podcasts. After being stuck, I decided that I want to use Vespa instead, which I had played around with before and liked.
I will not go over Vespa but instead focus on how to:…
Long time no sea …
Modal offers great cloud services and one of the best things is the fast start up(and avilability) of GPU:s. I hace historically use some of the hyperscalers out there but it is just so slow and so many steps. Also modal offers 30$ a month for free.
However I have been looking for an option to get a interactive terminal up on a machine with a GPU but not had a good way to do it.…
List of good blogs
https://decodingml.substack.com/ https://research.atspotify.com/ https://magazine.sebastianraschka.com/ https://www.philschmid.de/ https://huggingface.co/blog https://news.ycombinator.com/ …
TLDR: code can be found here.
In this blog post we will dive down in to how to build a small CLI for sharing files. The goal is to go over how to build a go cli for sharing files. We will set it up so that a shareable link will be created with a set expiration time and the object will be cleaned up after twice that time. For a file we will do the following:…