DuckDB is an open source column-oriented relational database management system (RDBMS), first released in 2019.
It is designed to provide high performance in complex queries on large databases in embedded configuration, such as combining tables with hundreds of columns and billions of rows. Unlike other embedded databases (e.g. SQLite).
DuckDB does not compete with traditional DBMS such as MSSQL, PostgreSQL, etc.
DuckDB uses a vectorized query processing engine. DuckDB is unique among database management systems because it has no external dependencies and can be built with just a C++11 compiler.
DuckDB also deviates from the traditional Client-Server model by running inside a host process (it has bindings, for example, to a Python interpreter with the ability to put data directly into NumPy arrays).
There are numerous situations in which using DuckDB can be advantageous. Below we will see how to install DuckDB, which can be installed on any operating system, we will see how to install it on Ubuntu and on Windows, in addition to usability tips and comparisons with the commands from GNU/UNIX.
As mentioned, DuckDB can be used on any operating system. The procedure is to download and move it to the system folder responsible for storing the programs, however, it is also possible to install it via pip.
First, it is good practice to update the system before installing new packages.
sudo apt update && sudo apt upgrade -y
You can download DuckDB directly from the official website or use a package manager like pip
if you prefer the Python version.
Example based on the current version (v1.2.1) according to the publication date of this article. For more recent versions, visit the releases page on GitHub.
wget https://github.com/duckdb/duckdb/releases/download/v1.2.1/duckdb_cli-linux-amd64.zip
unzip duckdb_cli-linux-amd64.zip
sudo mv duckdb /usr/local/bin/
pip install duckdb
Note:
pip
requires a virtual environment, this may make it difficult to use DuckDB.
After installation, you can verify that DuckDB is installed correctly by running:
duckdb --version
.zip
file corresponding to the latest version for Windows..zip
file to a folder of your choice, for example, C:\duckdb
.PATH
:Path
variable, select it and click βEditβ.C:\duckdb
).duckdb --version
Here are some examples of queries in DuckDB and their approximate equivalents in GNU/UNIX commands:
SELECT * FROM table;
cat
):cat file.csv
SELECT * FROM table WHERE column = 'value';
grep
):grep 'value' file.csv
SELECT COUNT(*) FROM table;
wc -l
):wc -l file.csv
SELECT column, COUNT(*) FROM table GROUP BY column;
awk
and sort
):awk -F, '{print $1}' file.csv | sort | uniq -c
SELECT * FROM table ORDER BY column;
sort
):sort -t, -k1 arquivo.csv
To use DuckDB directly via the command line, simply invoke the duckdb
command. Below is an example of creating a database, table, update, select, and more:
v1.2.1 8e52ec4395
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D SHOW DATABASES;
βββββββββββββββββ
β database_name β
β varchar β
βββββββββββββββββ€
β memory β
βββββββββββββββββ
D CREATE TABLE terminalroot (id INTEGER, name STRING);
D INSERT INTO terminalroot VALUES(1, 'Marcos Oliveira');
D INSERT INTO terminalroot VALUES(2, ' Mark Raasveldt');
D INSERT INTO terminalroot VALUES(3, 'Hannes MΓΌhleisen');
D SELECT * FROM terminalroot;
βββββββββ¬βββββββββββββββββββ
β id β name β
β int32 β varchar β
βββββββββΌβββββββββββββββββββ€
β 1 β Marcos Oliveira β
β 2 β Mark Raasveldt β
β 3 β Hannes MΓΌhleisen β
βββββββββ΄βββββββββββββββββββ
D UPDATE terminalroot SET name='Mark Raasveldt' WHERE id = 2;
D SELECT * FROM terminalroot;
βββββββββ¬βββββββββββββββββββ
β id β name β
β int32 β varchar β
βββββββββΌβββββββββββββββββββ€
β 1 β Marcos Oliveira β
β 2 β Mark Raasveldt β
β 3 β Hannes MΓΌhleisen β
βββββββββ΄βββββββββββββββββββ
D
There is a huge ecosystem surrounding DuckDB and you can even test it online. Below are some interesting links for you to check out!