Rust Foreign data wrappers for postgres


Background In this blog post we will try to implement a foreign data wrappers for postgres in rust. We will build on top of pgx in order to not have to build everything from the ground. But first of what is a foreign data wrapper? From the postgres docs: A foreign data wrapper is a library that can communicate with an external data source, hiding the details of connecting to the data source and obtaining data from it.…
Read more ⟶

DBOS: A Database-Oriented Operating System


A group of researches are proposing a radical change of the future operating system. Replacing the fundamental idea from Unix that everything is a file and instead relying on concepts from the database world a operating system that supports large scale distributed applications in the cloud can be built. | Everything is a file The core principles suggested to achieve this is: Store all application in tables in a distributed database Store all OS state in tables in a distributed database.…
Read more ⟶

Bloom filters


Bloom filters is a probabilistic data structure, which is space efficient. Bloom filters can be used to quickly check if a value don’t exists or might exists in a set, false positives are possible(with a low likelihood) but false negatives are not possible. The time to check if an element exsist or add an element is also constant O(k), where k is the number of hash functions(we will covert this later).…
Read more ⟶

Setting up a Basic dbt Development Container for BigQuery in GCP


In this post, you will learn how to set up a basic dbt project in Google Cloud Platform (GCP) and share a development container to kickstart your project. While there are numerous blog posts out there about dbt and BigQuery, none of them share how to set it up in a development container without using any of the dbt-cloud services (at least to my knowledge). Setting upp your enviorment Set up a gcp project/ or take one you allready have…
Read more ⟶

Go memory arenas for apache arrow, Part 2


This blog post will continue to try to dive down in to apache arrow and specifically the Go memory allocation for Apache arrow. This is a follow up to Go memory arenas for apache arrow, Part 1. First of all why do we want to manage memory manually instead of using the GC? One of arrows key features is it support to share memory with out copy between programs however for a GC collected language this will not work that great.…
Read more ⟶

Push based query engine


In this blog post we will dive down in to the difference between push based vs pull based query engines. As simple as is sounds push based is based upon that data is pushed from the sink through the different operators, this is used by snowflake and argued to be superior for OLAP which we will dive deeper into. Pull based have been around for a longer time and is based upon that data is pulled from the sink up through the operators, this is also known as theVolcano Iterator Model…
Read more ⟶

Go memory arenas for apache arrow, Part 1


This blog post will try to dive down in to apache arrow and specifically the Go memory allocation for Apache arrow. Apache arrow state that they allow for the following types of memory allocations: Go default allocations(standard go GC collected memory) CGo allocator(memory allocated through CG0) Checked Memory Allocator Will deep dive in to these once in a follow up blog post. Today the goal is to extend with a new memory allocator, mostly becuase I read up on go memory arena which where introduced in to 1.…
Read more ⟶

How to faster apply for a new job


Applying for jobs are fun, but it is not fun to write cover letters at least not compare to hacking on some new open source tool or testing out something new. Therefore the goal is to reduce the time to apply for a job thus the goal today is to generate cover letters. For this we will need the following: CV, I will copy paste mine from linkedin(minimal effort and I spend way to much time on linkedin so it is up to date) Job add(will take one from linkedin that sounds interesting) Get the CV.…
Read more ⟶

Awesome go resources


This blog post is a collection of awesome go resource and will be continuously update. The goal is that each resource should be describe shortly as well. Blogs: Generics can make your Go code slower Effective go Go memory model inline Videos: Obscure Go Optimisations A Guide to the Go Garbage Collector …
Read more ⟶

Storing Kubeflow Pipeline Templates in GCP Artifact Registry


In this blog post, we will discuss how to store Kubeflow Pipeline templates in GCP Artifact Registry, enabling reusability and version control for your pipelines. Using Artifact Registry over Cloud Storage simplifies version control and allows for easier collaboration between single or multiple users. The Kubeflow Pipelines SDK registry client is a new client interface that you can use with a compatible registry server (ensure you are using the correct KFP version), such as Artifact Registry, for version control of your Kubeflow Pipelines (KFP) templates.…
Read more ⟶