Meet DuckDB, a modern database

DuckDB is an open source column-oriented relational database management system (RDBMS), first released in 2019.

It is designed to provide high performance in complex queries on large databases in embedded configuration, such as combining tables with hundreds of columns and billions of rows. Unlike other embedded databases (e.g. SQLite).

DuckDB does not compete with traditional DBMS such as MSSQL, PostgreSQL, etc.

DuckDB uses a vectorized query processing engine. DuckDB is unique among database management systems because it has no external dependencies and can be built with just a C++11 compiler.

DuckDB also deviates from the traditional Client-Server model by running inside a host process (it has bindings, for example, to a Python interpreter with the ability to put data directly into NumPy arrays).

There are numerous situations in which using DuckDB can be advantageous. Below we will see how to install DuckDB, which can be installed on any operating system, we will see how to install it on Ubuntu and on Windows, in addition to usability tips and comparisons with the commands from GNU/UNIX.

Installation

As mentioned, DuckDB can be used on any operating system. The procedure is to download and move it to the system folder responsible for storing the programs, however, it is also possible to install it via pip.

Example on Ubuntu:

01. Update the system:

First, it is good practice to update the system before installing new packages.

sudo apt update && sudo apt upgrade -y

02. Download DuckDB:

You can download DuckDB directly from the official website or use a package manager like pip if you prefer the Python version.

Method 1: Download the binary directly:

Example based on the current version (v1.2.1) according to the publication date of this article. For more recent versions, visit the releases page on GitHub.

wget https://github.com/duckdb/duckdb/releases/download/v1.2.1/duckdb_cli-linux-amd64.zip
unzip duckdb_cli-linux-amd64.zip
sudo mv duckdb /usr/local/bin/

Method 2: Install via pip (for use with Python):

pip install duckdb

Note: pip requires a virtual environment, this may make it difficult to use DuckDB.

03. Verify the installation:

After installation, you can verify that DuckDB is installed correctly by running:

duckdb --version

Installation on Windows

01. Download DuckDB:

Access the official DuckDB repository on GitHub.
Download the .zip file corresponding to the latest version for Windows.

02. Extract the file:

Extract the contents of the .zip file to a folder of your choice, for example, C:\duckdb.

03. Add to `PATH`:

For ease of use, you can add the path of the folder where DuckDB was extracted to your PATH. - Open “Control Panel” > “System and Security” > “System” > “Advanced system settings” > “Environment variables”.
In the “System variables” section, find the Path variable, select it and click “Edit”.
Add the path of the folder where DuckDB was extracted (e.g. C:\duckdb).

04. Verify the installation:

Open the Command Prompt or PowerShell and run:

duckdb --version

How to use DuckDB

Here are some examples of queries in DuckDB and their approximate equivalents in GNU/UNIX commands:

01. Select all rows from a table:

DuckDB:

SELECT * FROM table;

GNU/UNIX (using cat):

cat file.csv

02. Filter rows based on a condition:

DuckDB:

SELECT * FROM table WHERE column = 'value';

GNU/UNIX (using grep):

grep 'value' file.csv

03. Count the number of rows:

DuckDB:

SELECT COUNT(*) FROM table;

GNU/UNIX (using wc -l):

wc -l file.csv

04. Group and count:

DuckDB:

SELECT column, COUNT(*) FROM table GROUP BY column;

GNU/UNIX (using awk and sort):

awk -F, '{print $1}' file.csv | sort | uniq -c

05. Sort results:

DuckDB:

SELECT * FROM table ORDER BY column;

GNU/UNIX (using sort):

sort -t, -k1 arquivo.csv

Using DuckDB via REPL

To use DuckDB directly via the command line, simply invoke the duckdb command. Below is an example of creating a database, table, update, select, and more:

v1.2.1 8e52ec4395
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D SHOW DATABASES;
┌───────────────┐
│ database_name │
│    varchar    │
├───────────────┤
│ memory        │
└───────────────┘
D CREATE TABLE terminalroot (id INTEGER, name STRING);
D INSERT INTO terminalroot VALUES(1, 'Marcos Oliveira');
D INSERT INTO terminalroot VALUES(2, ' Mark Raasveldt');
D INSERT INTO terminalroot VALUES(3, 'Hannes Mühleisen');
D SELECT * FROM terminalroot;
┌───────┬──────────────────┐
│  id   │       name       │
│ int32 │     varchar      │
├───────┼──────────────────┤
│     1 │ Marcos Oliveira  │
│     2 │  Mark Raasveldt  │
│     3 │ Hannes Mühleisen │
└───────┴──────────────────┘
D UPDATE terminalroot SET name='Mark Raasveldt' WHERE id = 2;
D SELECT * FROM terminalroot;
┌───────┬──────────────────┐
│  id   │       name       │
│ int32 │     varchar      │
├───────┼──────────────────┤
│     1 │ Marcos Oliveira  │
│     2 │ Mark Raasveldt   │
│     3 │ Hannes Mühleisen │
└───────┴──────────────────┘
D

There is a huge ecosystem surrounding DuckDB and you can even test it online. Below are some interesting links for you to check out!

sql duckdb cpp