layout | title | github_repository | redirect_from | ||
---|---|---|---|---|---|
docu |
Swift Client |
|
DuckDB has a Swift client. See the [announcement post]({% post_url 2023-04-21-swift %}) for details.
DuckDB supports both in-memory and persistent databases. To work with an in-memory datatabase, run:
let database = try Database(store: .inMemory)
To work with a persistent database, run:
let database = try Database(store: .file(at: "test.db"))
Queries can be issued through a database connection.
let connection = try database.connect()
DuckDB supports multiple connections per database.
The rest of the page is based on the example of our [announcement post]({% post_url 2023-04-21-swift %}), which uses raw data from NASA's Exoplanet Archive loaded directly into DuckDB.
We first create an application-specific type that we'll use to house our database and connection and through which we'll eventually define our app-specific queries.
import DuckDB
final class ExoplanetStore {
let database: Database
let connection: Connection
init(database: Database, connection: Connection) {
self.database = database
self.connection = connection
}
}
We load the data from NASA's Exoplanet Archive:
wget https://exoplanetarchive.ipac.caltech.edu/TAP/sync?query=select+pl_name+,+disc_year+from+pscomppars&format=csv -O downloaded_exoplanets.csv
Once we have our CSV downloaded locally, we can use the following SQL command to load it as a new table to DuckDB:
CREATE TABLE exoplanets AS
SELECT * FROM read_csv('downloaded_exoplanets.csv');
Let's package this up as a new asynchronous factory method on our ExoplanetStore
type:
import DuckDB
import Foundation
final class ExoplanetStore {
// Factory method to create and prepare a new ExoplanetStore
static func create() async throws -> ExoplanetStore {
// Create our database and connection as described above
let database = try Database(store: .inMemory)
let connection = try database.connect()
// Download the CSV from the exoplanet archive
let (csvFileURL, _) = try await URLSession.shared.download(
from: URL(string: "https://exoplanetarchive.ipac.caltech.edu/TAP/sync?query=select+pl_name+,+disc_year+from+pscomppars&format=csv")!)
// Issue our first query to DuckDB
try connection.execute("""
CREATE TABLE exoplanets AS
SELECT * FROM read_csv('\(csvFileURL.path)');
""")
// Create our pre-populated ExoplanetStore instance
return ExoplanetStore(
database: database,
connection: connection
)
}
// Let's make the initializer we defined previously
// private. This prevents anyone accidentally instantiating
// the store without having pre-loaded our Exoplanet CSV
// into the database
private init(database: Database, connection: Connection) {
...
}
}
The following example queires DuckDB from within Swift via an async function. This means the callee won't be blocked while the query is executing. We'll then cast the result columns to Swift native types using DuckDB's ResultSet
cast(to:)
family of methods, before finally wrapping them up in a DataFrame
from the TabularData framework.
...
import TabularData
extension ExoplanetStore {
// Retrieves the number of exoplanets discovered by year
func groupedByDiscoveryYear() async throws -> DataFrame {
// Issue the query we described above
let result = try connection.query("""
SELECT disc_year, count(disc_year) AS Count
FROM exoplanets
GROUP BY disc_year
ORDER BY disc_year
""")
// Cast our DuckDB columns to their native Swift
// equivalent types
let discoveryYearColumn = result[0].cast(to: Int.self)
let countColumn = result[1].cast(to: Int.self)
// Use our DuckDB columns to instantiate TabularData
// columns and populate a TabularData DataFrame
return DataFrame(columns: [
TabularData.Column(discoveryYearColumn).eraseToAnyColumn(),
TabularData.Column(countColumn).eraseToAnyColumn(),
])
}
}
For the complete example project, clone the DuckDB Swift repository and open up the runnable app project located in Examples/SwiftUI/ExoplanetExplorer.xcodeproj
.
layout: docu title: Node.js API redirect_from:
- /docs/api/nodejs
- /docs/api/nodejs/
- /docs/api/nodejs/overview
- /docs/api/nodejs/overview/
Deprecated The old DuckDB Node.js package is deprecated. Please use the [DuckDB Node Neo package]({% link docs/clients/node_neo/overview.md %}) instead.
This package provides a Node.js API for DuckDB. The API for this client is somewhat compliant to the SQLite Node.js client for easier transition.
Load the package and create a database object:
const duckdb = require('duckdb');
const db = new duckdb.Database(':memory:'); // or a file name for a persistent DB
All options as described on [Database configuration]({% link docs/configuration/overview.md %}#configuration-reference) can be (optionally) supplied to the Database
constructor as second argument. The third argument can be optionally supplied to get feedback on the given options.
const db = new duckdb.Database(':memory:', {
"access_mode": "READ_WRITE",
"max_memory": "512MB",
"threads": "4"
}, (err) => {
if (err) {
console.error(err);
}
});
The following code snippet runs a simple query using the Database.all()
method.
db.all('SELECT 42 AS fortytwo', function(err, res) {
if (err) {
console.warn(err);
return;
}
console.log(res[0].fortytwo)
});
Other available methods are each
, where the callback is invoked for each row, run
to execute a single statement without results and exec
, which can execute several SQL commands at once but also does not return results. All those commands can work with prepared statements, taking the values for the parameters as additional arguments. For example like so:
db.all('SELECT ?::INTEGER AS fortytwo, ?::VARCHAR AS hello', 42, 'Hello, World', function(err, res) {
if (err) {
console.warn(err);
return;
}
console.log(res[0].fortytwo)
console.log(res[0].hello)
});
A database can have multiple Connection
s, those are created using db.connect()
.
const con = db.connect();
You can create multiple connections, each with their own transaction context.
Connection
objects also contain shorthands to directly call run()
, all()
and each()
with parameters and callbacks, respectively, for example:
con.all('SELECT 42 AS fortytwo', function(err, res) {
if (err) {
console.warn(err);
return;
}
console.log(res[0].fortytwo)
});
From connections, you can create prepared statements (and only that) using con.prepare()
:
const stmt = con.prepare('SELECT ?::INTEGER AS fortytwo');
To execute this statement, you can call for example all()
on the stmt
object:
stmt.all(42, function(err, res) {
if (err) {
console.warn(err);
} else {
console.log(res[0].fortytwo)
}
});
You can also execute the prepared statement multiple times. This is for example useful to fill a table with data:
con.run('CREATE TABLE a (i INTEGER)');
const stmt = con.prepare('INSERT INTO a VALUES (?)');
for (let i = 0; i < 10; i++) {
stmt.run(i);
}
stmt.finalize();
con.all('SELECT * FROM a', function(err, res) {
if (err) {
console.warn(err);
} else {
console.log(res)
}
});
prepare()
can also take a callback which gets the prepared statement as an argument:
const stmt = con.prepare('SELECT ?::INTEGER AS fortytwo', function(err, stmt) {
stmt.all(42, function(err, res) {
if (err) {
console.warn(err);
} else {
console.log(res[0].fortytwo)
}
});
});
[Apache Arrow]({% link docs/guides/python/sql_on_arrow.md %}) can be used to insert data into DuckDB without making a copy:
const arrow = require('apache-arrow');
const db = new duckdb.Database(':memory:');
const jsonData = [
{"userId":1,"id":1,"title":"delectus aut autem","completed":false},
{"userId":1,"id":2,"title":"quis ut nam facilis et officia qui","completed":false}
];
// note; doesn't work on Windows yet
db.exec(`INSTALL arrow; LOAD arrow;`, (err) => {
if (err) {
console.warn(err);
return;
}
const arrowTable = arrow.tableFromJSON(jsonData);
db.register_buffer("jsonDataTable", [arrow.tableToIPC(arrowTable)], true, (err, res) => {
if (err) {
console.warn(err);
return;
}
// `SELECT * FROM jsonDataTable` would return the entries in `jsonData`
});
});
To load [unsigned extensions]({% link docs/extensions/overview.md %}#unsigned-extensions), instantiate the database as follows:
db = new duckdb.Database(':memory:', {"allow_unsigned_extensions": "true"});
layout: docu title: Node.js API redirect_from:
- /docs/api/nodejs/reference
- /docs/api/nodejs/reference/
- ColumnInfo :
object
- TypeInfo :
object
- DuckDbError :
object
- HTTPError :
object
Summary: DuckDB is an embeddable SQL OLAP Database Management System
- duckdb
- ~Connection
- .run(sql, ...params, callback) ⇒
void
- .all(sql, ...params, callback) ⇒
void
- .arrowIPCAll(sql, ...params, callback) ⇒
void
- .arrowIPCStream(sql, ...params, callback) ⇒
- .each(sql, ...params, callback) ⇒
void
- .stream(sql, ...params)
- .register_udf(name, return_type, fun) ⇒
void
- .prepare(sql, ...params, callback) ⇒
Statement
- .exec(sql, ...params, callback) ⇒
void
- .register_udf_bulk(name, return_type, callback) ⇒
void
- .unregister_udf(name, return_type, callback) ⇒
void
- .register_buffer(name, array, force, callback) ⇒
void
- .unregister_buffer(name, callback) ⇒
void
- .close(callback) ⇒
void
- .run(sql, ...params, callback) ⇒
- ~Statement
- ~QueryResult
- ~Database
- .close(callback) ⇒
void
- .close_internal(callback) ⇒
void
- .wait(callback) ⇒
void
- .serialize(callback) ⇒
void
- .parallelize(callback) ⇒
void
- .connect(path) ⇒
Connection
- .interrupt(callback) ⇒
void
- .prepare(sql) ⇒
Statement
- .run(sql, ...params, callback) ⇒
void
- .scanArrowIpc(sql, ...params, callback) ⇒
void
- .each(sql, ...params, callback) ⇒
void
- .stream(sql, ...params)
- .all(sql, ...params, callback) ⇒
void
- .arrowIPCAll(sql, ...params, callback) ⇒
void
- .arrowIPCStream(sql, ...params, callback) ⇒
void
- .exec(sql, ...params, callback) ⇒
void
- .register_udf(name, return_type, fun) ⇒
this
- .register_buffer(name) ⇒
this
- .unregister_buffer(name) ⇒
this
- .unregister_udf(name) ⇒
this
- .registerReplacementScan(fun) ⇒
this
- .tokenize(text) ⇒
ScriptTokens
- .get()
- .close(callback) ⇒
- ~TokenType
- ~ERROR :
number
- ~OPEN_READONLY :
number
- ~OPEN_READWRITE :
number
- ~OPEN_CREATE :
number
- ~OPEN_FULLMUTEX :
number
- ~OPEN_SHAREDCACHE :
number
- ~OPEN_PRIVATECACHE :
number
- ~Connection
Kind: inner class of duckdb
- ~Connection
- .run(sql, ...params, callback) ⇒
void
- .all(sql, ...params, callback) ⇒
void
- .arrowIPCAll(sql, ...params, callback) ⇒
void
- .arrowIPCStream(sql, ...params, callback) ⇒
- .each(sql, ...params, callback) ⇒
void
- .stream(sql, ...params)
- .register_udf(name, return_type, fun) ⇒
void
- .prepare(sql, ...params, callback) ⇒
Statement
- .exec(sql, ...params, callback) ⇒
void
- .register_udf_bulk(name, return_type, callback) ⇒
void
- .unregister_udf(name, return_type, callback) ⇒
void
- .register_buffer(name, array, force, callback) ⇒
void
- .unregister_buffer(name, callback) ⇒
void
- .close(callback) ⇒
void
- .run(sql, ...params, callback) ⇒
Run a SQL statement and trigger a callback when done
Kind: instance method of Connection
Param | Type |
---|---|
sql | |
...params | * |
callback |
Run a SQL query and triggers the callback once for all result rows
Kind: instance method of Connection
Param | Type |
---|---|
sql | |
...params | * |
callback |
Run a SQL query and serialize the result into the Apache Arrow IPC format (requires arrow extension to be loaded)
Kind: instance method of Connection
Param | Type |
---|---|
sql | |
...params | * |
callback |
Run a SQL query, returns a IpcResultStreamIterator that allows streaming the result into the Apache Arrow IPC format (requires arrow extension to be loaded)
Kind: instance method of Connection
Returns: Promise
Param | Type |
---|---|
sql | |
...params | * |
callback |
Runs a SQL query and triggers the callback for each result row
Kind: instance method of Connection
Param | Type |
---|---|
sql | |
...params | * |
callback |
Kind: instance method of Connection
Param | Type |
---|---|
sql | |
...params | * |
Register a User Defined Function
Kind: instance method of Connection
Note: this follows the wasm udfs somewhat but is simpler because we can pass data much more cleanly
Param |
---|
name |
return_type |
fun |
Prepare a SQL query for execution
Kind: instance method of Connection
Param | Type |
---|---|
sql | |
...params | * |
callback |
Execute a SQL query
Kind: instance method of Connection
Param | Type |
---|---|
sql | |
...params | * |
callback |
Register a User Defined Function
Kind: instance method of Connection
Param |
---|
name |
return_type |
callback |
Unregister a User Defined Function
Kind: instance method of Connection
Param |
---|
name |
return_type |
callback |
Register a Buffer to be scanned using the Apache Arrow IPC scanner (requires arrow extension to be loaded)
Kind: instance method of Connection
Param |
---|
name |
array |
force |
callback |
Unregister the Buffer
Kind: instance method of Connection
Param |
---|
name |
callback |
Closes connection
Kind: instance method of Connection
Param |
---|
callback |
Kind: inner class of duckdb
Kind: instance property of Statement
Returns: sql contained in statement
Field:
Not implemented
Kind: instance method of Statement
Kind: instance method of Statement
Param | Type |
---|---|
sql | |
...params | * |
callback |
Kind: instance method of Statement
Param | Type |
---|---|
sql | |
...params | * |
callback |
Kind: instance method of Statement
Param | Type |
---|---|
sql | |
...params | * |
callback |
Kind: instance method of Statement
Param | Type |
---|---|
sql | |
...params | * |
callback |
Kind: instance method of Statement
Param | Type |
---|---|
sql | |
...params | * |
callback |
Kind: instance method of Statement
Param | Type |
---|---|
sql | |
...params | * |
statement.columns() ⇒ Array.<ColumnInfo>
Kind: instance method of Statement
Returns: Array.<ColumnInfo>
- - Array of column names and types
Kind: inner class of duckdb
Kind: instance method of QueryResult
Returns: data chunk
Function to fetch the next result blob of an Arrow IPC Stream in a zero-copy way. (requires arrow extension to be loaded)
Kind: instance method of QueryResult
Returns: data chunk
Kind: instance method of QueryResult
Main database interface
Kind: inner property of duckdb
Param | Description |
---|---|
path | path to database file or :memory: for in-memory database |
access_mode | access mode |
config | the configuration object |
callback | callback function |
- ~Database
- .close(callback) ⇒
void
- .close_internal(callback) ⇒
void
- .wait(callback) ⇒
void
- .serialize(callback) ⇒
void
- .parallelize(callback) ⇒
void
- .connect(path) ⇒
Connection
- .interrupt(callback) ⇒
void
- .prepare(sql) ⇒
Statement
- .run(sql, ...params, callback) ⇒
void
- .scanArrowIpc(sql, ...params, callback) ⇒
void
- .each(sql, ...params, callback) ⇒
void
- .stream(sql, ...params)
- .all(sql, ...params, callback) ⇒
void
- .arrowIPCAll(sql, ...params, callback) ⇒
void
- .arrowIPCStream(sql, ...params, callback) ⇒
void
- .exec(sql, ...params, callback) ⇒
void
- .register_udf(name, return_type, fun) ⇒
this
- .register_buffer(name) ⇒
this
- .unregister_buffer(name) ⇒
this
- .unregister_udf(name) ⇒
this
- .registerReplacementScan(fun) ⇒
this
- .tokenize(text) ⇒
ScriptTokens
- .get()
- .close(callback) ⇒
Closes database instance
Kind: instance method of Database
Param |
---|
callback |
Internal method. Do not use, call Connection#close instead
Kind: instance method of Database
Param |
---|
callback |
Triggers callback when all scheduled database tasks have completed.
Kind: instance method of Database
Param |
---|
callback |
Currently a no-op. Provided for SQLite compatibility
Kind: instance method of Database
Param |
---|
callback |
Currently a no-op. Provided for SQLite compatibility
Kind: instance method of Database
Param |
---|
callback |
Create a new database connection
Kind: instance method of Database
Param | Description |
---|---|
path | the database to connect to, either a file path, or :memory: |
Supposedly interrupt queries, but currently does not do anything.
Kind: instance method of Database
Param |
---|
callback |
Prepare a SQL query for execution
Kind: instance method of Database
Param |
---|
sql |
Convenience method for Connection#run using a built-in default connection
Kind: instance method of Database
Param | Type |
---|---|
sql | |
...params | * |
callback |
Convenience method for Connection#scanArrowIpc using a built-in default connection
Kind: instance method of Database
Param | Type |
---|---|
sql | |
...params | * |
callback |
Kind: instance method of Database
Param | Type |
---|---|
sql | |
...params | * |
callback |
Kind: instance method of Database
Param | Type |
---|---|
sql | |
...params | * |
Convenience method for Connection#apply using a built-in default connection
Kind: instance method of Database
Param | Type |
---|---|
sql | |
...params | * |
callback |
Convenience method for Connection#arrowIPCAll using a built-in default connection
Kind: instance method of Database
Param | Type |
---|---|
sql | |
...params | * |
callback |
Convenience method for Connection#arrowIPCStream using a built-in default connection
Kind: instance method of Database
Param | Type |
---|---|
sql | |
...params | * |
callback |
Kind: instance method of Database
Param | Type |
---|---|
sql | |
...params | * |
callback |
Register a User Defined Function
Convenience method for Connection#register_udf
Kind: instance method of Database
Param |
---|
name |
return_type |
fun |
Register a buffer containing serialized data to be scanned from DuckDB.
Convenience method for Connection#unregister_buffer
Kind: instance method of Database
Param |
---|
name |
Unregister a Buffer
Convenience method for Connection#unregister_buffer
Kind: instance method of Database
Param |
---|
name |
Unregister a UDF
Convenience method for Connection#unregister_udf
Kind: instance method of Database
Param |
---|
name |
Register a table replace scan function
Kind: instance method of Database
Param | Description |
---|---|
fun | Replacement scan function |
Return positions and types of tokens in given text
Kind: instance method of Database
Param |
---|
text |
Not implemented
Kind: instance method of Database
Types of tokens return by tokenize
.
Kind: inner property of duckdb
Check that errno attribute equals this to check for a duckdb error
Kind: inner constant of duckdb
Open database in readonly mode
Kind: inner constant of duckdb
Currently ignored
Kind: inner constant of duckdb
Currently ignored
Kind: inner constant of duckdb
Currently ignored
Kind: inner constant of duckdb
Currently ignored
Kind: inner constant of duckdb
Currently ignored
Kind: inner constant of duckdb
Kind: global typedef
Properties
Name | Type | Description |
---|---|---|
name | string |
Column name |
type | TypeInfo |
Column type |
Kind: global typedef
Properties
Name | Type | Description |
---|---|---|
id | string |
Type ID |
[alias] | string |
SQL type alias |
sql_type | string |
SQL type name |
Kind: global typedef
Properties
Name | Type | Description |
---|---|---|
errno | number |
-1 for DuckDB errors |
message | string |
Error message |
code | string |
'DUCKDB_NODEJS_ERROR' for DuckDB errors |
errorType | string |
DuckDB error type code (eg, HTTP, IO, Catalog) |
Kind: global typedef
Extends: DuckDbError
Properties
Name | Type | Description |
---|---|---|
statusCode | number |
HTTP response status code |
reason | string |
HTTP response reason |
response | string |
HTTP response body |
headers | object |
HTTP headers |
layout: docu title: Node.js Client (Neo) redirect_from:
- /docs/api/node_neo/overview
- /docs/api/node_neo/overview/
An API for using [DuckDB]({% link index.html %}) in Node.js.
The primary package, @duckdb/node-api, is a high-level API meant for applications. It depends on low-level bindings that adhere closely to [DuckDB's C API]({% link docs/clients/c/overview.md %}), available separately as @duckdb/node-bindings.
Main Differences from duckdb-node
- Native support for Promises; no need for separate duckdb-async wrapper.
- DuckDB-specific API; not based on the SQLite Node API.
- Lossless & efficent support for values of all [DuckDB data types]({% link docs/sql/data_types/overview.md %}).
- Wraps released DuckDB binaries instead of rebuilding DuckDB.
- Built on [DuckDB's C API]({% link docs/clients/c/overview.md %}); exposes more functionality.
Some features are not yet complete:
- Appending and binding advanced data types. (Additional DuckDB C API support needed.)
- Writing to data chunk vectors. (Needs special handling in Node.)
- User-defined types & functions. (Support for this was added to the DuckDB C API in v1.1.0.)
- Profiling info. (Added in v1.1.0)
- Table description. (Added in v1.1.0)
- APIs for Arrow. (This part of the DuckDB C API is deprecated.)
- Linux ARM64 (experimental)
- Linux AMD64
- macOS (Darwin) ARM64 (Apple Silicon)
- macOS (Darwin) AMD64 (Intel)
- Windows (Win32) AMD64
import duckdb from '@duckdb/node-api';
console.log(duckdb.version());
console.log(duckdb.configurationOptionDescriptions());
import { DuckDBInstance } from '@duckdb/node-api';
Create with an in-memory database:
const instance = await DuckDBInstance.create(':memory:');
Equivalent to the above:
const instance = await DuckDBInstance.create();
Read from and write to a database file, which is created if needed:
const instance = await DuckDBInstance.create('my_duckdb.db');
Set configuration options:
const instance = await DuckDBInstance.create('my_duckdb.db', {
threads: '4'
});
const connection = await instance.connect();
const result = await connection.run('from test_all_types()');
const prepared = await connection.prepare('select $1, $2');
prepared.bindVarchar(1, 'duck');
prepared.bindInteger(2, 42);
const result = await prepared.run();
Get column names and types:
const columnNames = result.columnNames();
const columnTypes = result.columnTypes();
Fetch all chunks:
const chunks = await result.fetchAllChunks();
Fetch one chunk at a time:
const chunks = [];
while (true) {
const chunk = await result.fetchChunk();
// Last chunk will have zero rows.
if (!chunk || chunk.rowCount === 0) {
break;
}
chunks.push(chunk);
}
Read chunk data (column-major):
// array of columns, each as an array of values
const columns = chunk.getColumns();
Read chunk data (row-major):
// array of rows, each as an array of values
const rows = chunk.getRows();
Read chunk data (one value at a time):
const columns = [];
const columnCount = chunk.columnCount;
for (let columnIndex = 0; columnIndex < columnCount; columnIndex++) {
const columnValues = [];
const columnVector = chunk.getColumnVector(columnIndex);
const itemCount = columnVector.itemCount;
for (let itemIndex = 0; itemIndex < itemCount; itemIndex++) {
const value = columnVector.getItem(itemIndex);
columnValues.push(value);
}
columns.push(columnValues);
}
Run and read all data:
const reader = await connection.runAndReadAll('FROM test_all_types()');
const rows = reader.getRows();
// OR: const columns = reader.getColumns();
Run and read up to (at least) some number of rows:
const reader = await connection.runAndReadUtil('FROM range(5000)', 1000);
const rows = reader.getRows();
// rows.length === 2048. (Rows are read in chunks of 2048.)
Read rows incrementally:
const reader = await connection.runAndRead('FROM range(5000)');
reader.readUntil(2000);
// reader.currentRowCount === 2048 (Rows are read in chunks of 2048.)
// reader.done === false
reader.readUntil(4000);
// reader.currentRowCount === 4096
// reader.done === false
reader.readUntil(6000);
// reader.currentRowCount === 5000
// reader.done === true
import { DuckDBTypeId } from '@duckdb/node-api';
if (columnType.typeId === DuckDBTypeId.ARRAY) {
const arrayValueType = columnType.valueType;
const arrayLength = columnType.length;
}
if (columnType.typeId === DuckDBTypeId.DECIMAL) {
const decimalWidth = columnType.width;
const decimalScale = columnType.scale;
}
if (columnType.typeId === DuckDBTypeId.ENUM) {
const enumValues = columnType.values;
}
if (columnType.typeId === DuckDBTypeId.LIST) {
const listValueType = columnType.valueType;
}
if (columnType.typeId === DuckDBTypeId.MAP) {
const mapKeyType = columnType.keyType;
const mapValueType = columnType.valueType;
}
if (columnType.typeId === DuckDBTypeId.STRUCT) {
const structEntryNames = columnType.names;
const structEntryTypes = columnType.valueTypes;
}
if (columnType.typeId === DuckDBTypeId.UNION) {
const unionMemberTags = columnType.memberTags;
const unionMemberTypes = columnType.memberTypes;
}
// For the JSON type (https://duckdb.org/docs/data/json/json_type)
if (columnType.alias === 'JSON') {
const json = JSON.parse(columnValue);
}
Every type implements toString. The result is both human-friendly and readable by DuckDB in an appropriate expression.
const typeString = columnType.toString();
import { DuckDBTypeId } from '@duckdb/node-api';
if (columnType.typeId === DuckDBTypeId.ARRAY) {
const arrayItems = columnValue.items; // array of values
const arrayString = columnValue.toString();
}
if (columnType.typeId === DuckDBTypeId.BIT) {
const bools = columnValue.toBools(); // array of booleans
const bits = columnValue.toBits(); // arrary of 0s and 1s
const bitString = columnValue.toString(); // string of '0's and '1's
}
if (columnType.typeId === DuckDBTypeId.BLOB) {
const blobBytes = columnValue.bytes; // Uint8Array
const blobString = columnValue.toString();
}
if (columnType.typeId === DuckDBTypeId.DATE) {
const dateDays = columnValue.days;
const dateString = columnValue.toString();
const { year, month, day } = columnValue.toParts();
}
if (columnType.typeId === DuckDBTypeId.DECIMAL) {
const decimalWidth = columnValue.width;
const decimalScale = columnValue.scale;
// Scaled-up value. Represented number is value/(10^scale).
const decimalValue = columnValue.value; // bigint
const decimalString = columnValue.toString();
const decimalDouble = columnValue.toDouble();
}
if (columnType.typeId === DuckDBTypeId.INTERVAL) {
const intervalMonths = columnValue.months;
const intervalDays = columnValue.days;
const intervalMicros = columnValue.micros; // bigint
const intervalString = columnValue.toString();
}
if (columnType.typeId === DuckDBTypeId.LIST) {
const listItems = columnValue.items; // array of values
const listString = columnValue.toString();
}
if (columnType.typeId === DuckDBTypeId.MAP) {
const mapEntries = columnValue.entries; // array of { key, value }
const mapString = columnValue.toString();
}
if (columnType.typeId === DuckDBTypeId.STRUCT) {
// { name1: value1, name2: value2, ... }
const structEntries = columnValue.entries;
const structString = columnValue.toString();
}
if (columnType.typeId === DuckDBTypeId.TIMESTAMP_MS) {
const timestampMillis = columnValue.milliseconds; // bigint
const timestampMillisString = columnValue.toString();
}
if (columnType.typeId === DuckDBTypeId.TIMESTAMP_NS) {
const timestampNanos = columnValue.nanoseconds; // bigint
const timestampNanosString = columnValue.toString();
}
if (columnType.typeId === DuckDBTypeId.TIMESTAMP_S) {
const timestampSecs = columnValue.seconds; // bigint
const timestampSecsString = columnValue.toString();
}
if (columnType.typeId === DuckDBTypeId.TIMESTAMP_TZ) {
const timestampTZMicros = columnValue.micros; // bigint
const timestampTZString = columnValue.toString();
const {
date: { year, month, day },
time: { hour, min, sec, micros },
} = columnValue.toParts();
}
if (columnType.typeId === DuckDBTypeId.TIMESTAMP) {
const timestampMicros = columnValue.micros; // bigint
const timestampString = columnValue.toString();
const {
date: { year, month, day },
time: { hour, min, sec, micros },
} = columnValue.toParts();
}
if (columnType.typeId === DuckDBTypeId.TIME_TZ) {
const timeTZMicros = columnValue.micros; // bigint
const timeTZOffset = columnValue.offset;
const timeTZString = columnValue.toString();
const {
time: { hour, min, sec, micros },
offset,
} = columnValue.toParts();
}
if (columnType.typeId === DuckDBTypeId.TIME) {
const timeMicros = columnValue.micros; // bigint
const timeString = columnValue.toString();
const { hour, min, sec, micros } = columnValue.toParts();
}
if (columnType.typeId === DuckDBTypeId.UNION) {
const unionTag = columnValue.tag;
const unionValue = columnValue.value;
const unionValueString = columnValue.toString();
}
if (columnType.typeId === DuckDBTypeId.UUID) {
const uuidHugeint = columnValue.hugeint; // bigint
const uuidString = columnValue.toString();
}
// other possible values are: null, boolean, number, bigint, or string
await connection.run(
`create or replace table target_table(i integer, v varchar)`
);
const appender = await connection.createAppender('main', 'target_table');
appender.appendInteger(42);
appender.appendVarchar('duck');
appender.endRow();
appender.appendInteger(123);
appender.appendVarchar('mallard');
appender.endRow();
appender.flush();
appender.appendInteger(17);
appender.appendVarchar('goose');
appender.endRow();
appender.close(); // also flushes
const extractedStatements = await connection.extractStatements(`
create or replace table numbers as from range(?);
from numbers where range < ?;
drop table numbers;
`);
const parameterValues = [10, 7];
const statementCount = extractedStatements.count;
for (let stmtIndex = 0; stmtIndex < statementCount; stmtIndex++) {
const prepared = await extractedStatements.prepare(stmtIndex);
let parameterCount = prepared.parameterCount;
for (let paramIndex = 1; paramIndex <= parameterCount; paramIndex++) {
prepared.bindInteger(paramIndex, parameterValues.shift());
}
const result = await prepared.run();
// ...
}
import { DuckDBPendingResultState } from '@duckdb/node-api';
async function sleep(ms) {
return new Promise((resolve) => {
setTimeout(resolve, ms);
});
}
const prepared = await connection.prepare('FROM range(10_000_000)');
const pending = prepared.start();
while (pending.runTask() !== DuckDBPendingResultState.RESULT_READY) {
console.log('not ready');
await sleep(1);
}
console.log('ready');
const result = await pending.getResult();
// ...
layout: docu title: Go Client github_repository: https://github.com/marcboeker/go-duckdb redirect_from:
- /docs/api/go
- /docs/api/go/
The DuckDB Go driver, go-duckdb
, allows using DuckDB via the database/sql
interface.
For examples on how to use this interface, see the official documentation and tutorial.
The
go-duckdb
project, hosted at https://github.com/marcboeker/go-duckdb, is the official DuckDB Go client.
To install the go-duckdb
client, run:
go get github.com/marcboeker/go-duckdb
To import the DuckDB Go package, add the following entries to your imports:
import (
"database/sql"
_ "github.com/marcboeker/go-duckdb"
)
The DuckDB Go client supports the [DuckDB Appender API]({% link docs/data/appender.md %}) for bulk inserts. You can obtain a new Appender by supplying a DuckDB connection to NewAppenderFromConn()
. For example:
connector, err := duckdb.NewConnector("test.db", nil)
if err != nil {
...
}
conn, err := connector.Connect(context.Background())
if err != nil {
...
}
defer conn.Close()
// Retrieve appender from connection (note that you have to create the table 'test' beforehand).
appender, err := NewAppenderFromConn(conn, "", "test")
if err != nil {
...
}
defer appender.Close()
err = appender.AppendRow(...)
if err != nil {
...
}
// Optional, if you want to access the appended rows immediately.
err = appender.Flush()
if err != nil {
...
}
An example for using the Go API is as follows:
package main
import (
"database/sql"
"errors"
"fmt"
"log"
_ "github.com/marcboeker/go-duckdb"
)
func main() {
db, err := sql.Open("duckdb", "")
if err != nil {
log.Fatal(err)
}
defer db.Close()
_, err = db.Exec(`CREATE TABLE people (id INTEGER, name VARCHAR)`)
if err != nil {
log.Fatal(err)
}
_, err = db.Exec(`INSERT INTO people VALUES (42, 'John')`)
if err != nil {
log.Fatal(err)
}
var (
id int
name string
)
row := db.QueryRow(`SELECT id, name FROM people`)
err = row.Scan(&id, &name)
if errors.Is(err, sql.ErrNoRows) {
log.Println("no rows")
} else if err != nil {
log.Fatal(err)
}
fmt.Printf("id: %d, name: %s\n", id, name)
}
For more examples, see the examples in the duckdb-go
repository.
layout: docu title: Client Overview redirect_from:
- /docs/clients
- /docs/clients/
- /docs/api/overview
- /docs/api/overview/
DuckDB is an in-process database system and offers client APIs (also known as “drivers”) for several languages.
Client API | Maintainer | Support tier | Latest version |
---|---|---|---|
[C]({% link docs/clients/c/overview.md %}) | The DuckDB team | Primary | [{{ site.currentduckdbversion }}]({% link docs/installation/index.html %}?version=stable&environment=cplusplus) |
[Command Line Interface (CLI)]({% link docs/clients/cli/overview.md %}) | The DuckDB team | Primary | [{{ site.currentduckdbversion }}]({% link docs/installation/index.html %}?version=stable&environment=cli) |
[Java (JDBC)]({% link docs/clients/java.md %}) | The DuckDB team | Primary | {{ site.currentjavaversion }} |
[Go]({% link docs/clients/go.md %}) | Marc Boeker and the DuckDB team | Primary | 1.1.3 |
[Node.js (node-neo)]({% link docs/clients/node_neo/overview.md %}) | Jeff Raymakers and Antony Courtney (MotherDuck) | Primary | 1.2.0 |
[Python]({% link docs/clients/python/overview.md %}) | The DuckDB team | Primary | {{ site.currentduckdbversion }} |
[R]({% link docs/clients/r.md %}) | Kirill Müller and the DuckDB team | Primary | 1.2.0 |
[WebAssembly (Wasm)]({% link docs/clients/wasm/overview.md %}) | The DuckDB team | Primary | 1.2.0 |
[ADBC (Arrow)]({% link docs/clients/adbc.md %}) | The DuckDB team | Secondary | [{{ site.currentduckdbversion }}]({% link docs/extensions/arrow.md %}) |
C# (.NET) | Giorgi | Secondary | 1.2.0 |
[C++]({% link docs/clients/cpp.md %}) | The DuckDB team | Secondary | [1.2.0]({% link docs/installation/index.html %}?version=stable&environment=cplusplus) |
[Dart]({% link docs/clients/dart.md %}) | TigerEye | Secondary | 1.1.3 |
[Julia]({% link docs/clients/julia.md %}) | The DuckDB team | Secondary | 1.2.0 |
[Node.js (deprecated)]({% link docs/clients/nodejs/overview.md %}) | The DuckDB team | Secondary | 1.2.0 |
[ODBC]({% link docs/clients/odbc/overview.md %}) | The DuckDB team | Secondary | [1.1.0]({% link docs/installation/index.html %}?version=stable&environment=odbc) |
[Rust]({% link docs/clients/rust.md %}) | The DuckDB team | Secondary | 1.2.0 |
[Swift]({% link docs/clients/swift.md %}) | The DuckDB team | Secondary | 1.2.0 |
Common Lisp | ak-coram | Tertiary | |
Crystal | amauryt | Tertiary | |
Elixir | AlexR2D2 | Tertiary | |
Erlang | MM Zeeman | Tertiary | |
Ruby | suketa | Tertiary | |
Zig | karlseguin | Tertiary |
Since there is such a wide variety of clients, the DuckDB team focuses their development effort on the most popular clients. To reflect this, we distinguish three tiers of support for clients. Primary clients are the first to receive new features and are covered by community support. Secondary clients receive new features but are not covered by community support. Finally, all tertiary clients are maintained by third parties, so there are no feature or support guarantees for them.
The DuckDB clients listed above are open-source and we welcome community contributions to these libraries. All primary and secondary clients are available for the MIT license. For tertiary clients, please consult the repository for the license.
We report the latest stable version for the clients in the primary and secondary support tiers.
All DuckDB clients support the same DuckDB SQL syntax and use the same on-disk [database format]({% link docs/internals/storage.md %}). [DuckDB extensions]({% link docs/extensions/overview.md %}) are also portable between clients with some exceptions (see [Wasm extensions]({% link docs/clients/wasm/extensions.md %}#list-of-officially-available-extensions)).
layout: docu title: Spark API redirect_from:
- /docs/api/python/spark_api
- /docs/api/python/spark_api/
The DuckDB Spark API implements the PySpark API, allowing you to use the familiar Spark API to interact with DuckDB. All statements are translated to DuckDB's internal plans using our [relational API]({% link docs/clients/python/relational_api.md %}) and executed using DuckDB's query engine.
Warning The DuckDB Spark API is currently experimental and features are still missing. We are very interested in feedback. Please report any functionality that you are missing, either through Discord or on GitHub.
from duckdb.experimental.spark.sql import SparkSession as session
from duckdb.experimental.spark.sql.functions import lit, col
import pandas as pd
spark = session.builder.getOrCreate()
pandas_df = pd.DataFrame({
'age': [34, 45, 23, 56],
'name': ['Joan', 'Peter', 'John', 'Bob']
})
df = spark.createDataFrame(pandas_df)
df = df.withColumn(
'location', lit('Seattle')
)
res = df.select(
col('age'),
col('location')
).collect()
print(res)
[
Row(age=34, location='Seattle'),
Row(age=45, location='Seattle'),
Row(age=23, location='Seattle'),
Row(age=56, location='Seattle')
]
Contributions to the experimental Spark API are welcome. When making a contribution, please follow these guidelines:
- Instead of using temporary files, use our
pytest
testing framework. - When adding new functions, ensure that method signatures comply with those in the PySpark API.
layout: docu title: Python API redirect_from:
- /docs/api/python
- /docs/api/python/
- /docs/api/python/overview
- /docs/api/python/overview/
The DuckDB Python API can be installed using pip: pip install duckdb
. Please see the [installation page]({% link docs/installation/index.html %}?environment=python) for details. It is also possible to install DuckDB using conda: conda install python-duckdb -c conda-forge
.
Python version: DuckDB requires Python 3.7 or newer.
The most straight-forward manner of running SQL queries using DuckDB is using the duckdb.sql
command.
import duckdb
duckdb.sql("SELECT 42").show()
This will run queries using an in-memory database that is stored globally inside the Python module. The result of the query is returned as a Relation. A relation is a symbolic representation of the query. The query is not executed until the result is fetched or requested to be printed to the screen.
Relations can be referenced in subsequent queries by storing them inside variables, and using them as tables. This way queries can be constructed incrementally.
import duckdb
r1 = duckdb.sql("SELECT 42 AS i")
duckdb.sql("SELECT i * 2 AS k FROM r1").show()
DuckDB can ingest data from a wide variety of formats – both on-disk and in-memory. See the [data ingestion page]({% link docs/clients/python/data_ingestion.md %}) for more information.
import duckdb
duckdb.read_csv("example.csv") # read a CSV file into a Relation
duckdb.read_parquet("example.parquet") # read a Parquet file into a Relation
duckdb.read_json("example.json") # read a JSON file into a Relation
duckdb.sql("SELECT * FROM 'example.csv'") # directly query a CSV file
duckdb.sql("SELECT * FROM 'example.parquet'") # directly query a Parquet file
duckdb.sql("SELECT * FROM 'example.json'") # directly query a JSON file
DuckDB can directly query Pandas DataFrames, Polars DataFrames and Arrow tables.
Note that these are read-only, i.e., editing these tables via [INSERT
]({% link docs/sql/statements/insert.md %}) or [UPDATE
statements]({% link docs/sql/statements/update.md %}) is not possible.
To directly query a Pandas DataFrame, run:
import duckdb
import pandas as pd
pandas_df = pd.DataFrame({"a": [42]})
duckdb.sql("SELECT * FROM pandas_df")
┌───────┐
│ a │
│ int64 │
├───────┤
│ 42 │
└───────┘
To directly query a Polars DataFrame, run:
import duckdb
import polars as pl
polars_df = pl.DataFrame({"a": [42]})
duckdb.sql("SELECT * FROM polars_df")
┌───────┐
│ a │
│ int64 │
├───────┤
│ 42 │
└───────┘
To directly query a PyArrow table, run:
import duckdb
import pyarrow as pa
arrow_table = pa.Table.from_pydict({"a": [42]})
duckdb.sql("SELECT * FROM arrow_table")
┌───────┐
│ a │
│ int64 │
├───────┤
│ 42 │
└───────┘
DuckDB supports converting query results efficiently to a variety of formats. See the [result conversion page]({% link docs/clients/python/conversion.md %}) for more information.
import duckdb
duckdb.sql("SELECT 42").fetchall() # Python objects
duckdb.sql("SELECT 42").df() # Pandas DataFrame
duckdb.sql("SELECT 42").pl() # Polars DataFrame
duckdb.sql("SELECT 42").arrow() # Arrow Table
duckdb.sql("SELECT 42").fetchnumpy() # NumPy Arrays
DuckDB supports writing Relation objects directly to disk in a variety of formats. The [COPY
statement]({% link docs/sql/statements/copy.md %}) can be used to write data to disk using SQL as an alternative.
import duckdb
duckdb.sql("SELECT 42").write_parquet("out.parquet") # Write to a Parquet file
duckdb.sql("SELECT 42").write_csv("out.csv") # Write to a CSV file
duckdb.sql("COPY (SELECT 42) TO 'out.parquet'") # Copy to a Parquet file
Applications can open a new DuckDB connection via the duckdb.connect()
method.
When using DuckDB through duckdb.sql()
, it operates on an in-memory database, i.e., no tables are persisted on disk.
Invoking the duckdb.connect()
method without arguments returns a connection, which also uses an in-memory database:
import duckdb
con = duckdb.connect()
con.sql("SELECT 42 AS x").show()
The duckdb.connect(dbname)
creates a connection to a persistent database.
Any data written to that connection will be persisted, and can be reloaded by reconnecting to the same file, both from Python and from other DuckDB clients.
import duckdb
# create a connection to a file called 'file.db'
con = duckdb.connect("file.db")
# create a table and load data into it
con.sql("CREATE TABLE test (i INTEGER)")
con.sql("INSERT INTO test VALUES (42)")
# query the table
con.table("test").show()
# explicitly close the connection
con.close()
# Note: connections also closed implicitly when they go out of scope
You can also use a context manager to ensure that the connection is closed:
import duckdb
with duckdb.connect("file.db") as con:
con.sql("CREATE TABLE test (i INTEGER)")
con.sql("INSERT INTO test VALUES (42)")
con.table("test").show()
# the context manager closes the connection automatically
The duckdb.connect()
accepts a config
dictionary, where [configuration options]({% link docs/configuration/overview.md %}#configuration-reference) can be specified. For example:
import duckdb
con = duckdb.connect(config = {'threads': 1})
The connection object and the duckdb
module can be used interchangeably – they support the same methods. The only difference is that when using the duckdb
module a global in-memory database is used.
If you are developing a package designed for others to use, and use DuckDB in the package, it is recommend that you create connection objects instead of using the methods on the
duckdb
module. That is because theduckdb
module uses a shared global database – which can cause hard to debug issues if used from within multiple different packages.
The DuckDBPyConnection
object is not thread-safe. If you would like to write to the same database from multiple threads, create a cursor for each thread with the [DuckDBPyConnection.cursor()
method]({% link docs/clients/python/reference/index.md %}#duckdb.DuckDBPyConnection.cursor).
DuckDB's Python API provides functions for installing and loading [extensions]({% link docs/extensions/overview.md %}), which perform the equivalent operations to running the INSTALL
and LOAD
SQL commands, respectively. An example that installs and loads the [spatial
extension]({% link docs/extensions/spatial/overview.md %}) looks like follows:
import duckdb
con = duckdb.connect()
con.install_extension("spatial")
con.load_extension("spatial")
To load [community extensions]({% link community_extensions/index.md %}), use repository="community"
argument to the install_extension
method.
For example, install and load the h3
community extension as follows:
import duckdb
con = duckdb.connect()
con.install_extension("h3", repository="community")
con.load_extension("h3")
To load [unsigned extensions]({% link docs/extensions/overview.md %}#unsigned-extensions), use the config = {"allow_unsigned_extensions": "true"}
argument to the duckdb.connect()
method.
layout: docu title: Expression API redirect_from:
- /docs/api/python/expression
- /docs/api/python/expression/
The Expression
class represents an instance of an [expression]({% link docs/sql/expressions/overview.md %}).
Using this API makes it possible to dynamically build up expressions, which are typically created by the parser from the query string. This allows you to skip that and have more fine-grained control over the used expressions.
Below is a list of currently supported expressions that can be created through the API.
This expression references a column by name.
import duckdb
import pandas as pd
df = pd.DataFrame({
'a': [1, 2, 3, 4],
'b': [True, None, False, True],
'c': [42, 21, 13, 14]
})
Selecting a single column:
col = duckdb.ColumnExpression('a')
res = duckdb.df(df).select(col).fetchall()
print(res)
[(1,), (2,), (3,), (4,)]
Selecting multiple columns:
col_list = [
duckdb.ColumnExpression('a') * 10,
duckdb.ColumnExpression('b').isnull(),
duckdb.ColumnExpression('c') + 5
]
res = duckdb.df(df).select(*col_list).fetchall()
print(res)
[(10, False, 47), (20, True, 26), (30, False, 18), (40, False, 19)]
This expression selects all columns of the input source.
Optionally it's possible to provide an exclude
list to filter out columns of the table.
This exclude
list can contain either strings or Expressions.
import duckdb
import pandas as pd
df = pd.DataFrame({
'a': [1, 2, 3, 4],
'b': [True, None, False, True],
'c': [42, 21, 13, 14]
})
star = duckdb.StarExpression(exclude = ['b'])
res = duckdb.df(df).select(star).fetchall()
print(res)
[(1, 42), (2, 21), (3, 13), (4, 14)]
This expression contains a single value.
import duckdb
import pandas as pd
df = pd.DataFrame({
'a': [1, 2, 3, 4],
'b': [True, None, False, True],
'c': [42, 21, 13, 14]
})
const = duckdb.ConstantExpression('hello')
res = duckdb.df(df).select(const).fetchall()
print(res)
[('hello',), ('hello',), ('hello',), ('hello',)]
This expression contains a CASE WHEN (...) THEN (...) ELSE (...) END
expression.
By default ELSE
is NULL
and it can be set using .else(value = ...)
.
Additional WHEN (...) THEN (...)
blocks can be added with .when(condition = ..., value = ...)
.
import duckdb
import pandas as pd
from duckdb import (
ConstantExpression,
ColumnExpression,
CaseExpression
)
df = pd.DataFrame({
'a': [1, 2, 3, 4],
'b': [True, None, False, True],
'c': [42, 21, 13, 14]
})
hello = ConstantExpression('hello')
world = ConstantExpression('world')
case = \
CaseExpression(condition = ColumnExpression('b') == False, value = world) \
.otherwise(hello)
res = duckdb.df(df).select(case).fetchall()
print(res)
[('hello',), ('hello',), ('world',), ('hello',)]
This expression contains a function call. It can be constructed by providing the function name and an arbitrary amount of Expressions as arguments.
import duckdb
import pandas as pd
from duckdb import (
ConstantExpression,
ColumnExpression,
FunctionExpression
)
df = pd.DataFrame({
'a': [
'test',
'pest',
'text',
'rest',
]
})
ends_with = FunctionExpression('ends_with', ColumnExpression('a'), ConstantExpression('est'))
res = duckdb.df(df).select(ends_with).fetchall()
print(res)
[(True,), (True,), (False,), (True,)]
The Expression class also contains many operations that can be applied to any Expression type.
Operation | Description |
---|---|
.alias(name: str) |
Applies an alias to the expression |
.cast(type: DuckDBPyType) |
Applies a cast to the provided type on the expression |
.isin(*exprs: Expression) |
Creates an [IN expression]({% link docs/sql/expressions/in.md %}#in) against the provided expressions as the list |
.isnotin(*exprs: Expression) |
Creates a [NOT IN expression]({% link docs/sql/expressions/in.md %}#not-in) against the provided expressions as the list |
.isnotnull() |
Checks whether the expression is not NULL |
.isnull() |
Checks whether the expression is NULL |
When expressions are provided to DuckDBPyRelation.order()
, the following order operations can be applied.
Operation | Description |
---|---|
.asc() |
Indicates that this expression should be sorted in ascending order |
.desc() |
Indicates that this expression should be sorted in descending order |
.nulls_first() |
Indicates that the nulls in this expression should precede the non-null values |
.nulls_last() |
Indicates that the nulls in this expression should come after the non-null values |
layout: docu title: Python Function API redirect_from:
- /docs/api/python/function
- /docs/api/python/function/
You can create a DuckDB user-defined function (UDF) from a Python function so it can be used in SQL queries. Similarly to regular [functions]({% link docs/sql/functions/overview.md %}), they need to have a name, a return type and parameter types.
Here is an example using a Python function that calls a third-party library.
import duckdb
from duckdb.typing import *
from faker import Faker
def generate_random_name():
fake = Faker()
return fake.name()
duckdb.create_function("random_name", generate_random_name, [], VARCHAR)
res = duckdb.sql("SELECT random_name()").fetchall()
print(res)
[('Gerald Ashley',)]
To register a Python UDF, use the create_function
method from a DuckDB connection. Here is the syntax:
import duckdb
con = duckdb.connect()
con.create_function(name, function, parameters, return_type)
The create_function
method takes the following parameters:
name
A string representing the unique name of the UDF within the connection catalog.function
The Python function you wish to register as a UDF.parameters
Scalar functions can operate on one or more columns. This parameter takes a list of column types used as input.return_type
Scalar functions return one element per row. This parameter specifies the return type of the function.type
(optional): DuckDB supports both built-in Python types and PyArrow Tables. By default, built-in types are assumed, but you can specifytype = 'arrow'
to use PyArrow Tables.null_handling
(optional): By default,NULL
values are automatically handled asNULL
-inNULL
-out. Users can specify a desired behavior forNULL
values by settingnull_handling = 'special'
.exception_handling
(optional): By default, when an exception is thrown from the Python function, it will be re-thrown in Python. Users can disable this behavior, and instead returnNULL
, by setting this parameter to'return_null'
side_effects
(optional): By default, functions are expected to produce the same result for the same input. If the result of a function is impacted by any type of randomness,side_effects
must be set toTrue
.
To unregister a UDF, you can call the remove_function
method with the UDF name:
con.remove_function(name)
When the function has type annotation it's often possible to leave out all of the optional parameters.
Using DuckDBPyType
we can implicitly convert many known types to DuckDBs type system.
For example:
import duckdb
def my_function(x: int) -> str:
return x
duckdb.create_function("my_func", my_function)
print(duckdb.sql("SELECT my_func(42)"))
┌─────────────┐
│ my_func(42) │
│ varchar │
├─────────────┤
│ 42 │
└─────────────┘
If only the parameter list types can be inferred, you'll need to pass in None
as parameters
.
By default when functions receive a NULL
value, this instantly returns NULL
, as part of the default NULL
-handling.
When this is not desired, you need to explicitly set this parameter to "special"
.
import duckdb
from duckdb.typing import *
def dont_intercept_null(x):
return 5
duckdb.create_function("dont_intercept", dont_intercept_null, [BIGINT], BIGINT)
res = duckdb.sql("SELECT dont_intercept(NULL)").fetchall()
print(res)
[(None,)]
With null_handling="special"
:
import duckdb
from duckdb.typing import *
def dont_intercept_null(x):
return 5
duckdb.create_function("dont_intercept", dont_intercept_null, [BIGINT], BIGINT, null_handling="special")
res = duckdb.sql("SELECT dont_intercept(NULL)").fetchall()
print(res)
[(5,)]
By default, when an exception is thrown from the Python function, we'll forward (re-throw) the exception.
If you want to disable this behavior, and instead return NULL
, you'll need to set this parameter to "return_null"
.
import duckdb
from duckdb.typing import *
def will_throw():
raise ValueError("ERROR")
duckdb.create_function("throws", will_throw, [], BIGINT)
try:
res = duckdb.sql("SELECT throws()").fetchall()
except duckdb.InvalidInputException as e:
print(e)
duckdb.create_function("doesnt_throw", will_throw, [], BIGINT, exception_handling="return_null")
res = duckdb.sql("SELECT doesnt_throw()").fetchall()
print(res)
Invalid Input Error: Python exception occurred while executing the UDF: ValueError: ERROR
At:
...(5): will_throw
...(9): <module>
[(None,)]
By default DuckDB will assume the created function is a pure function, meaning it will produce the same output when given the same input.
If your function does not follow that rule, for example when your function makes use of randomness, then you will need to mark this function as having side_effects
.
For example, this function will produce a new count for every invocation.
def count() -> int:
old = count.counter;
count.counter += 1
return old
count.counter = 0
If we create this function without marking it as having side effects, the result will be the following:
con = duckdb.connect()
con.create_function("my_counter", count, side_effects=False)
res = con.sql("SELECT my_counter() FROM range(10)").fetchall()
print(res)
[(0,), (0,), (0,), (0,), (0,), (0,), (0,), (0,), (0,), (0,)]
Which is obviously not the desired result, when we add side_effects=True
, the result is as we would expect:
con.remove_function("my_counter")
count.counter = 0
con.create_function("my_counter", count, side_effects=True)
res = con.sql("SELECT my_counter() FROM range(10)").fetchall()
print(res)
[(0,), (1,), (2,), (3,), (4,), (5,), (6,), (7,), (8,), (9,)]
Currently, two function types are supported, native
(default) and arrow
.
If the function is expected to receive arrow arrays, set the type
parameter to 'arrow'
.
This will let the system know to provide arrow arrays of up to STANDARD_VECTOR_SIZE
tuples to the function, and also expect an array of the same amount of tuples to be returned from the function.
When the function type is set to native
the function will be provided with a single tuple at a time, and expect only a single value to be returned.
This can be useful to interact with Python libraries that don't operate on Arrow, such as faker
:
import duckdb
from duckdb.typing import *
from faker import Faker
def random_date():
fake = Faker()
return fake.date_between()
duckdb.create_function("random_date", random_date, [], DATE, type="native")
res = duckdb.sql("SELECT random_date()").fetchall()
print(res)
[(datetime.date(2019, 5, 15),)]
layout: docu title: Python DB API redirect_from:
- /docs/api/python/dbapi
- /docs/api/python/dbapi/
The standard DuckDB Python API provides a SQL interface compliant with the DB-API 2.0 specification described by PEP 249 similar to the SQLite Python API.
To use the module, you must first create a DuckDBPyConnection
object that represents a connection to a database.
This is done through the [duckdb.connect
]({% link docs/clients/python/reference/index.md %}#duckdb.connect) method.
The 'config' keyword argument can be used to provide a dict
that contains key->value pairs referencing [settings]({% link docs/configuration/overview.md %}#configuration-reference) understood by DuckDB.
The special value :memory:
can be used to create an in-memory database. Note that for an in-memory database no data is persisted to disk (i.e., all data is lost when you exit the Python process).
The special value :memory:
can also be postfixed with a name, for example: :memory:conn3
.
When a name is provided, subsequent duckdb.connect
calls will create a new connection to the same database, sharing the catalogs (views, tables, macros etc..).
Using :memory:
without a name will always create a new and separate database instance.
By default we create an (unnamed) in-memory-database that lives inside the duckdb
module.
Every method of DuckDBPyConnection
is also available on the duckdb
module, this connection is what's used by these methods.
The special value :default:
can be used to get this default connection.
If the database
is a file path, a connection to a persistent database is established.
If the file does not exist the file will be created (the extension of the file is irrelevant and can be .db
, .duckdb
or anything else).
If you would like to connect in read-only mode, you can set the read_only
flag to True
. If the file does not exist, it is not created when connecting in read-only mode.
Read-only mode is required if multiple Python processes want to access the same database file at the same time.
import duckdb
duckdb.execute("CREATE TABLE tbl AS SELECT 42 a")
con = duckdb.connect(":default:")
con.sql("SELECT * FROM tbl")
# or
duckdb.default_connection.sql("SELECT * FROM tbl")
┌───────┐
│ a │
│ int32 │
├───────┤
│ 42 │
└───────┘
import duckdb
# to start an in-memory database
con = duckdb.connect(database = ":memory:")
# to use a database file (not shared between processes)
con = duckdb.connect(database = "my-db.duckdb", read_only = False)
# to use a database file (shared between processes)
con = duckdb.connect(database = "my-db.duckdb", read_only = True)
# to explicitly get the default connection
con = duckdb.connect(database = ":default:")
If you want to create a second connection to an existing database, you can use the cursor()
method. This might be useful for example to allow parallel threads running queries independently. A single connection is thread-safe but is locked for the duration of the queries, effectively serializing database access in this case.
Connections are closed implicitly when they go out of scope or if they are explicitly closed using close()
. Once the last connection to a database instance is closed, the database instance is closed as well.
SQL queries can be sent to DuckDB using the execute()
method of connections. Once a query has been executed, results can be retrieved using the fetchone
and fetchall
methods on the connection. fetchall
will retrieve all results and complete the transaction. fetchone
will retrieve a single row of results each time that it is invoked until no more results are available. The transaction will only close once fetchone
is called and there are no more results remaining (the return value will be None
). As an example, in the case of a query only returning a single row, fetchone
should be called once to retrieve the results and a second time to close the transaction. Below are some short examples:
# create a table
con.execute("CREATE TABLE items (item VARCHAR, value DECIMAL(10, 2), count INTEGER)")
# insert two items into the table
con.execute("INSERT INTO items VALUES ('jeans', 20.0, 1), ('hammer', 42.2, 2)")
# retrieve the items again
con.execute("SELECT * FROM items")
print(con.fetchall())
# [('jeans', Decimal('20.00'), 1), ('hammer', Decimal('42.20'), 2)]
# retrieve the items one at a time
con.execute("SELECT * FROM items")
print(con.fetchone())
# ('jeans', Decimal('20.00'), 1)
print(con.fetchone())
# ('hammer', Decimal('42.20'), 2)
print(con.fetchone()) # This closes the transaction. Any subsequent calls to .fetchone will return None
# None
The description
property of the connection object contains the column names as per the standard.
DuckDB also supports [prepared statements]({% link docs/sql/query_syntax/prepared_statements.md %}) in the API with the execute
and executemany
methods. The values may be passed as an additional parameter after a query that contains ?
or $1
(dollar symbol and a number) placeholders. Using the ?
notation adds the values in the same sequence as passed within the Python parameter. Using the $
notation allows for values to be reused within the SQL statement based on the number and index of the value found within the Python parameter. Values are converted according to the [conversion rules]({% link docs/clients/python/conversion.md %}#object-conversion-python-object-to-duckdb).
Here are some examples. First, insert a row using a [prepared statement]({% link docs/sql/query_syntax/prepared_statements.md %}):
con.execute("INSERT INTO items VALUES (?, ?, ?)", ["laptop", 2000, 1])
Second, insert several rows using a [prepared statement]({% link docs/sql/query_syntax/prepared_statements.md %}):
con.executemany("INSERT INTO items VALUES (?, ?, ?)", [["chainsaw", 500, 10], ["iphone", 300, 2]] )
Query the database using a [prepared statement]({% link docs/sql/query_syntax/prepared_statements.md %}):
con.execute("SELECT item FROM items WHERE value > ?", [400])
print(con.fetchall())
[('laptop',), ('chainsaw',)]
Query using the $
notation for a [prepared statement]({% link docs/sql/query_syntax/prepared_statements.md %}) and reused values:
con.execute("SELECT $1, $1, $2", ["duck", "goose"])
print(con.fetchall())
[('duck', 'duck', 'goose')]
Warning Do not use
executemany
to insert large amounts of data into DuckDB. See the [data ingestion page]({% link docs/clients/python/data_ingestion.md %}) for better options.
Besides the standard unnamed parameters, like $1
, $2
etc., it's also possible to supply named parameters, like $my_parameter
.
When using named parameters, you have to provide a dictionary mapping of str
to value in the parameters
argument.
An example use is the following:
import duckdb
res = duckdb.execute("""
SELECT
$my_param,
$other_param,
$also_param
""",
{
"my_param": 5,
"other_param": "DuckDB",
"also_param": [42]
}
).fetchall()
print(res)
[(5, 'DuckDB', [42])]
layout: docu title: Types API redirect_from:
- /docs/api/python/types
- /docs/api/python/types/
The DuckDBPyType
class represents a type instance of our [data types]({% link docs/sql/data_types/overview.md %}).
To make the API as easy to use as possible, we have added implicit conversions from existing type objects to a DuckDBPyType instance. This means that wherever a DuckDBPyType object is expected, it is also possible to provide any of the options listed below.
The table below shows the mapping of Python Built-in types to DuckDB type.
Built-in types | DuckDB type |
---|---|
bool | BOOLEAN |
bytearray | BLOB |
bytes | BLOB |
float | DOUBLE |
int | BIGINT |
str | VARCHAR |
The table below shows the mapping of Numpy DType to DuckDB type.
Type | DuckDB type |
---|---|
bool | BOOLEAN |
float32 | FLOAT |
float64 | DOUBLE |
int16 | SMALLINT |
int32 | INTEGER |
int64 | BIGINT |
int8 | TINYINT |
uint16 | USMALLINT |
uint32 | UINTEGER |
uint64 | UBIGINT |
uint8 | UTINYINT |
list
type objects map to a LIST
type of the child type.
Which can also be arbitrarily nested.
import duckdb
from typing import Union
duckdb.typing.DuckDBPyType(list[dict[Union[str, int], str]])
MAP(UNION(u1 VARCHAR, u2 BIGINT), VARCHAR)[]
dict
type objects map to a MAP
type of the key type and the value type.
import duckdb
print(duckdb.typing.DuckDBPyType(dict[str, int]))
MAP(VARCHAR, BIGINT)
dict
objects map to a STRUCT
composed of the keys and values of the dict.
import duckdb
print(duckdb.typing.DuckDBPyType({'a': str, 'b': int}))
STRUCT(a VARCHAR, b BIGINT)
typing.Union
objects map to a UNION
type of the provided types.
import duckdb
from typing import Union
print(duckdb.typing.DuckDBPyType(Union[int, str, bool, bytearray]))
UNION(u1 BIGINT, u2 VARCHAR, u3 BOOLEAN, u4 BLOB)
For the built-in types, you can use the constants defined in duckdb.typing
:
DuckDB type |
---|
BIGINT |
BIT |
BLOB |
BOOLEAN |
DATE |
DOUBLE |
FLOAT |
HUGEINT |
INTEGER |
INTERVAL |
SMALLINT |
SQLNULL |
TIME_TZ |
TIME |
TIMESTAMP_MS |
TIMESTAMP_NS |
TIMESTAMP_S |
TIMESTAMP_TZ |
TIMESTAMP |
TINYINT |
UBIGINT |
UHUGEINT |
UINTEGER |
USMALLINT |
UTINYINT |
UUID |
VARCHAR |
For the complex types there are methods available on the DuckDBPyConnection
object or the duckdb
module.
Anywhere a DuckDBPyType
is accepted, we will also accept one of the type objects that can implicitly convert to a DuckDBPyType
.
Parameters:
child_type: DuckDBPyType
Parameters:
fields: Union[list[DuckDBPyType], dict[str, DuckDBPyType]]
Parameters:
key_type: DuckDBPyType
value_type: DuckDBPyType
Parameters:
width: int
scale: int
Parameters:
members: Union[list[DuckDBPyType], dict[str, DuckDBPyType]]
Parameters:
collation: Optional[str]
layout: docu title: Known Python Issues redirect_from:
- /docs/api/python/known_issues
- /docs/api/python/known_issues/
Unfortunately there are some issues that are either beyond our control or are very elusive / hard to track down. Below is a list of these issues that you might have to be aware of, depending on your workflow.
When making use of multi threading and fetching results either directly as Numpy arrays or indirectly through a Pandas DataFrame, it might be necessary to ensure that numpy.core.multiarray
is imported.
If this module has not been imported from the main thread, and a different thread during execution attempts to import it this causes either a deadlock or a crash.
To avoid this, it's recommended to import numpy.core.multiarray
before starting up threads.
The DESCRIBE
and SUMMARIZE
statements return an empty table:
%sql
CREATE OR REPLACE TABLE tbl AS (SELECT 42 AS x);
DESCRIBE tbl;
To work around this, wrap them into a subquery:
%sql
CREATE OR REPLACE TABLE tbl AS (SELECT 42 AS x);
FROM (DESCRIBE tbl);
Loading the JupySQL extension in IPython fails:
In [1]: %load_ext sql
ImportError: cannot import name 'builder' from 'google.protobuf.internal' (unknown location)
The solution is to fix the protobuf
package. This may require uninstalling conflicting packages, e.g.:
%pip uninstall tensorflow
%pip install protobuf
In Python, the output of the [EXPLAIN
statement]({% link docs/guides/meta/explain.md %}) contains hard line breaks (\n
):
In [1]: import duckdb
...: duckdb.sql("EXPLAIN SELECT 42 AS x")
Out[1]:
┌───────────────┬───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ explain_key │ explain_value │
│ varchar │ varchar │
├───────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ physical_plan │ ┌───────────────────────────┐\n│ PROJECTION │\n│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │\n│ x … │
└───────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
To work around this, print
the output of the explain()
function:
In [2]: print(duckdb.sql("SELECT 42 AS x").explain())
Out[2]:
┌───────────────────────────┐
│ PROJECTION │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ x │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ DUMMY_SCAN │
└───────────────────────────┘
Please also check out the [Jupyter guide]({% link docs/guides/python/jupyter.md %}) for tips on using Jupyter with JupySQL.
When importing DuckDB on Windows, the Python runtime may return the following error:
import duckdb
ImportError: DLL load failed while importing duckdb: The specified module could not be found.
The solution is to install the Microsoft Visual C++ Redistributable package.
layout: docu title: Relational API redirect_from:
- /docs/api/python/relational_api
- /docs/api/python/relational_api/
The Relational API is an alternative API that can be used to incrementally construct queries. The API is centered around DuckDBPyRelation
nodes. The relations can be seen as symbolic representations of SQL queries. They do not hold any data – and nothing is executed – until a method that triggers execution is called.
Relations can be created from SQL queries using the duckdb.sql
method. Alternatively, they can be created from the various data ingestion methods (read_parquet
, read_csv
, read_json
).
For example, here we create a relation from a SQL query:
import duckdb
rel = duckdb.sql("SELECT * FROM range(10_000_000_000) tbl(id)")
rel.show()
┌────────────────────────┐
│ id │
│ int64 │
├────────────────────────┤
│ 0 │
│ 1 │
│ 2 │
│ 3 │
│ 4 │
│ 5 │
│ 6 │
│ 7 │
│ 8 │
│ 9 │
│ · │
│ · │
│ · │
│ 9990 │
│ 9991 │
│ 9992 │
│ 9993 │
│ 9994 │
│ 9995 │
│ 9996 │
│ 9997 │
│ 9998 │
│ 9999 │
├────────────────────────┤
│ ? rows │
│ (>9999 rows, 20 shown) │
└────────────────────────┘
Note how we are constructing a relation that computes an immense amount of data (10B rows or 74 GB of data). The relation is constructed instantly – and we can even print the relation instantly.
When printing a relation using show
or displaying it in the terminal, the first 10K
rows are fetched. If there are more than 10K
rows, the output window will show >9999 rows
(as the amount of rows in the relation is unknown).
Outside of SQL queries, the following methods are provided to construct relation objects from external data.
from_arrow
from_df
read_csv
read_json
read_parquet
Relation objects can be queried through SQL through [replacement scans]({% link docs/clients/c/replacement_scans.md %}). If you have a relation object stored in a variable, you can refer to that variable as if it was a SQL table (in the FROM
clause). This allows you to incrementally build queries using relation objects.
import duckdb
rel = duckdb.sql("SELECT * FROM range(1_000_000) tbl(id)")
duckdb.sql("SELECT sum(id) FROM rel").show()
┌──────────────┐
│ sum(id) │
│ int128 │
├──────────────┤
│ 499999500000 │
└──────────────┘
There are a number of operations that can be performed on relations. These are all short-hand for running the SQL queries – and will return relations again themselves.
Apply an (optionally grouped) aggregate over the relation. The system will automatically group by any columns that are not aggregates.
import duckdb
rel = duckdb.sql("SELECT * FROM range(1_000_000) tbl(id)")
rel.aggregate("id % 2 AS g, sum(id), min(id), max(id)")
┌───────┬──────────────┬─────────┬─────────┐
│ g │ sum(id) │ min(id) │ max(id) │
│ int64 │ int128 │ int64 │ int64 │
├───────┼──────────────┼─────────┼─────────┤
│ 0 │ 249999500000 │ 0 │ 999998 │
│ 1 │ 250000000000 │ 1 │ 999999 │
└───────┴──────────────┴─────────┴─────────┘
Select all rows in the first relation, that do not occur in the second relation. The relations must have the same number of columns.
import duckdb
r1 = duckdb.sql("SELECT * FROM range(10) tbl(id)")
r2 = duckdb.sql("SELECT * FROM range(5) tbl(id)")
r1.except_(r2).show()
┌───────┐
│ id │
│ int64 │
├───────┤
│ 5 │
│ 6 │
│ 7 │
│ 8 │
│ 9 │
└───────┘
Apply the given condition to the relation, filtering any rows that do not satisfy the condition.
import duckdb
rel = duckdb.sql("SELECT * FROM range(1_000_000) tbl(id)")
rel.filter("id > 5").limit(3).show()
┌───────┐
│ id │
│ int64 │
├───────┤
│ 6 │
│ 7 │
│ 8 │
└───────┘
Select the intersection of two relations – returning all rows that occur in both relations. The relations must have the same number of columns.
import duckdb
r1 = duckdb.sql("SELECT * FROM range(10) tbl(id)")
r2 = duckdb.sql("SELECT * FROM range(5) tbl(id)")
r1.intersect(r2).show()
┌───────┐
│ id │
│ int64 │
├───────┤
│ 0 │
│ 1 │
│ 2 │
│ 3 │
│ 4 │
└───────┘
Combine two relations, joining them based on the provided condition.
import duckdb
r1 = duckdb.sql("SELECT * FROM range(5) tbl(id)").set_alias("r1")
r2 = duckdb.sql("SELECT * FROM range(10, 15) tbl(id)").set_alias("r2")
r1.join(r2, "r1.id + 10 = r2.id").show()
┌───────┬───────┐
│ id │ id │
│ int64 │ int64 │
├───────┼───────┤
│ 0 │ 10 │
│ 1 │ 11 │
│ 2 │ 12 │
│ 3 │ 13 │
│ 4 │ 14 │
└───────┴───────┘
Select the first n rows, optionally offset by offset.
import duckdb
rel = duckdb.sql("SELECT * FROM range(1_000_000) tbl(id)")
rel.limit(3).show()
┌───────┐
│ id │
│ int64 │
├───────┤
│ 0 │
│ 1 │
│ 2 │
└───────┘
Sort the relation by the given set of expressions.
import duckdb
rel = duckdb.sql("SELECT * FROM range(1_000_000) tbl(id)")
rel.order("id DESC").limit(3).show()
┌────────┐
│ id │
│ int64 │
├────────┤
│ 999999 │
│ 999998 │
│ 999997 │
└────────┘
Apply the given expression to each row in the relation.
import duckdb
rel = duckdb.sql("SELECT * FROM range(1_000_000) tbl(id)")
rel.project("id + 10 AS id_plus_ten").limit(3).show()
┌─────────────┐
│ id_plus_ten │
│ int64 │
├─────────────┤
│ 10 │
│ 11 │
│ 12 │
└─────────────┘
Combine two relations, returning all rows in r1
followed by all rows in r2
. The relations must have the same number of columns.
import duckdb
r1 = duckdb.sql("SELECT * FROM range(5) tbl(id)")
r2 = duckdb.sql("SELECT * FROM range(10, 15) tbl(id)")
r1.union(r2).show()
┌───────┐
│ id │
│ int64 │
├───────┤
│ 0 │
│ 1 │
│ 2 │
│ 3 │
│ 4 │
│ 10 │
│ 11 │
│ 12 │
│ 13 │
│ 14 │
└───────┘
The result of relations can be converted to various types of Python structures, see the [result conversion page]({% link docs/clients/python/conversion.md %}) for more information.
The result of relations can also be directly written to files using the below methods.
- [
write_csv
]({% link docs/clients/python/reference/index.md %}#duckdb.DuckDBPyRelation.write_csv) - [
write_parquet
]({% link docs/clients/python/reference/index.md %}#duckdb.DuckDBPyRelation.write_parquet)
layout: docu title: Conversion between DuckDB and Python redirect_from:
- /docs/api/python/conversion
- /docs/api/python/conversion/
- /docs/api/python/result_conversion
- /docs/api/python/result_conversion/
This page documents the rules for converting Python objects to DuckDB and DuckDB results to Python.
This is a mapping of Python object types to DuckDB [Logical Types]({% link docs/sql/data_types/overview.md %}):
None
→NULL
bool
→BOOLEAN
datetime.timedelta
→INTERVAL
str
→VARCHAR
bytearray
→BLOB
memoryview
→BLOB
decimal.Decimal
→DECIMAL
/DOUBLE
uuid.UUID
→UUID
The rest of the conversion rules are as follows.
Since integers can be of arbitrary size in Python, there is not a one-to-one conversion possible for ints. Instead we perform these casts in order until one succeeds:
BIGINT
INTEGER
UBIGINT
UINTEGER
DOUBLE
When using the DuckDB Value class, it's possible to set a target type, which will influence the conversion.
These casts are tried in order until one succeeds:
DOUBLE
FLOAT
For datetime
we will check pandas.isnull
if it's available and return NULL
if it returns true
.
We check against datetime.datetime.min
and datetime.datetime.max
to convert to -inf
and +inf
respectively.
If the datetime
has tzinfo, we will use TIMESTAMPTZ
, otherwise it becomes TIMESTAMP
.
If the time
has tzinfo, we will use TIMETZ
, otherwise it becomes TIME
.
date
converts to the DATE
type.
We check against datetime.date.min
and datetime.date.max
to convert to -inf
and +inf
respectively.
bytes
converts to BLOB
by default, when it's used to construct a Value object of type BITSTRING
, it maps to BITSTRING
instead.
list
becomes a LIST
type of the “most permissive” type of its children, for example:
my_list_value = [
12345,
"test"
]
Will become VARCHAR[]
because 12345 can convert to VARCHAR
but test
can not convert to INTEGER
.
[12345, test]
The dict
object can convert to either STRUCT(...)
or MAP(..., ...)
depending on its structure.
If the dict has a structure similar to:
my_map_dict = {
"key": [
1, 2, 3
],
"value": [
"one", "two", "three"
]
}
Then we'll convert it to a MAP
of key-value pairs of the two lists zipped together.
The example above becomes a MAP(INTEGER, VARCHAR)
:
{1=one, 2=two, 3=three}
The names of the fields matter and the two lists need to have the same size.
Otherwise we'll try to convert it to a STRUCT
.
my_struct_dict = {
1: "one",
"2": 2,
"three": [1, 2, 3],
False: True
}
Becomes:
{'1': one, '2': 2, 'three': [1, 2, 3], 'False': true}
Every
key
of the dictionary is converted to string.
tuple
converts to LIST
by default, when it's used to construct a Value object of type STRUCT
it will convert to STRUCT
instead.
ndarray
and datetime64
are converted by calling tolist()
and converting the result of that.
DuckDB's Python client provides multiple additional methods that can be used to efficiently retrieve data.
fetchnumpy()
fetches the data as a dictionary of NumPy arrays
df()
fetches the data as a Pandas DataFramefetchdf()
is an alias ofdf()
fetch_df()
is an alias ofdf()
fetch_df_chunk(vector_multiple)
fetches a portion of the results into a DataFrame. The number of rows returned in each chunk is the vector size (2048 by default) * vector_multiple (1 by default).
arrow()
fetches the data as an Arrow tablefetch_arrow_table()
is an alias ofarrow()
fetch_record_batch(chunk_size)
returns an Arrow record batch reader withchunk_size
rows per batch
pl()
fetches the data as a Polars DataFrame
Below are some examples using this functionality. See the [Python guides]({% link docs/guides/overview.md %}#python-client) for more examples.
Fetch as Pandas DataFrame:
df = con.execute("SELECT * FROM items").fetchdf()
print(df)
item value count
0 jeans 20.0 1
1 hammer 42.2 2
2 laptop 2000.0 1
3 chainsaw 500.0 10
4 iphone 300.0 2
Fetch as dictionary of NumPy arrays:
arr = con.execute("SELECT * FROM items").fetchnumpy()
print(arr)
{'item': masked_array(data=['jeans', 'hammer', 'laptop', 'chainsaw', 'iphone'],
mask=[False, False, False, False, False],
fill_value='?',
dtype=object), 'value': masked_array(data=[20.0, 42.2, 2000.0, 500.0, 300.0],
mask=[False, False, False, False, False],
fill_value=1e+20), 'count': masked_array(data=[1, 2, 1, 10, 2],
mask=[False, False, False, False, False],
fill_value=999999,
dtype=int32)}
Fetch as an Arrow table. Converting to Pandas afterwards just for pretty printing:
tbl = con.execute("SELECT * FROM items").fetch_arrow_table()
print(tbl.to_pandas())
item value count
0 jeans 20.00 1
1 hammer 42.20 2
2 laptop 2000.00 1
3 chainsaw 500.00 10
4 iphone 300.00 2
layout: docu title: Data Ingestion redirect_from:
- /docs/api/python/data_ingestion
- /docs/api/python/data_ingestion/
This page contains examples for data ingestion to Python using DuckDB. First, import the DuckDB page:
import duckdb
Then, proceed with any of the following sections.
CSV files can be read using the read_csv
function, called either from within Python or directly from within SQL. By default, the read_csv
function attempts to auto-detect the CSV settings by sampling from the provided file.
Read from a file using fully auto-detected settings:
duckdb.read_csv("example.csv")
Read multiple CSV files from a folder:
duckdb.read_csv("folder/*.csv")
Specify options on how the CSV is formatted internally:
duckdb.read_csv("example.csv", header = False, sep = ",")
Override types of the first two columns:
duckdb.read_csv("example.csv", dtype = ["int", "varchar"])
Directly read a CSV file from within SQL:
duckdb.sql("SELECT * FROM 'example.csv'")
Call read_csv
from within SQL:
duckdb.sql("SELECT * FROM read_csv('example.csv')")
See the [CSV Import]({% link docs/data/csv/overview.md %}) page for more information.
Parquet files can be read using the read_parquet
function, called either from within Python or directly from within SQL.
Read from a single Parquet file:
duckdb.read_parquet("example.parquet")
Read multiple Parquet files from a folder:
duckdb.read_parquet("folder/*.parquet")
Read a Parquet file over [https]({% link docs/extensions/httpfs/overview.md %}):
duckdb.read_parquet("https://some.url/some_file.parquet")
Read a list of Parquet files:
duckdb.read_parquet(["file1.parquet", "file2.parquet", "file3.parquet"])
Directly read a Parquet file from within SQL:
duckdb.sql("SELECT * FROM 'example.parquet'")
Call read_parquet
from within SQL:
duckdb.sql("SELECT * FROM read_parquet('example.parquet')")
See the [Parquet Loading]({% link docs/data/parquet/overview.md %}) page for more information.
JSON files can be read using the read_json
function, called either from within Python or directly from within SQL. By default, the read_json
function will automatically detect if a file contains newline-delimited JSON or regular JSON, and will detect the schema of the objects stored within the JSON file.
Read from a single JSON file:
duckdb.read_json("example.json")
Read multiple JSON files from a folder:
duckdb.read_json("folder/*.json")
Directly read a JSON file from within SQL:
duckdb.sql("SELECT * FROM 'example.json'")
Call read_json
from within SQL:
duckdb.sql("SELECT * FROM read_json_auto('example.json')")
DuckDB is automatically able to query certain Python variables by referring to their variable name (as if it was a table). These types include the following: Pandas DataFrame, Polars DataFrame, Polars LazyFrame, NumPy arrays, [relations]({% link docs/clients/python/relational_api.md %}), and Arrow objects.
Only variables that are visible to Python code at the location of the sql()
or execute()
call can be used in this manner.
Accessing these variables is made possible by [replacement scans]({% link docs/clients/c/replacement_scans.md %}). To disable replacement scans entirely, use:
SET python_enable_replacements = false;
DuckDB supports querying multiple types of Apache Arrow objects including tables, datasets, RecordBatchReaders, and scanners. See the Python [guides]({% link docs/guides/overview.md %}#python-client) for more examples.
import duckdb
import pandas as pd
test_df = pd.DataFrame.from_dict({"i": [1, 2, 3, 4], "j": ["one", "two", "three", "four"]})
print(duckdb.sql("SELECT * FROM test_df").fetchall())
[(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four')]
DuckDB also supports “registering” a DataFrame or Arrow object as a virtual table, comparable to a SQL VIEW
. This is useful when querying a DataFrame/Arrow object that is stored in another way (as a class variable, or a value in a dictionary). Below is a Pandas example:
If your Pandas DataFrame is stored in another location, here is an example of manually registering it:
import duckdb
import pandas as pd
my_dictionary = {}
my_dictionary["test_df"] = pd.DataFrame.from_dict({"i": [1, 2, 3, 4], "j": ["one", "two", "three", "four"]})
duckdb.register("test_df_view", my_dictionary["test_df"])
print(duckdb.sql("SELECT * FROM test_df_view").fetchall())
[(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four')]
You can also create a persistent table in DuckDB from the contents of the DataFrame (or the view):
# create a new table from the contents of a DataFrame
con.execute("CREATE TABLE test_df_table AS SELECT * FROM test_df")
# insert into an existing table from the contents of a DataFrame
con.execute("INSERT INTO test_df_table SELECT * FROM test_df")
pandas.DataFrame
columns of an object
dtype require some special care, since this stores values of arbitrary type.
To convert these columns to DuckDB, we first go through an analyze phase before converting the values.
In this analyze phase a sample of all the rows of the column are analyzed to determine the target type.
This sample size is by default set to 1000.
If the type picked during the analyze step is incorrect, this will result in a "Failed to cast value:" error, in which case you will need to increase the sample size.
The sample size can be changed by setting the pandas_analyze_sample
config option.
# example setting the sample size to 100k
duckdb.execute("SET GLOBAL pandas_analyze_sample = 100_000")
You can register Python objects as DuckDB tables using the [DuckDBPyConnection.register()
function]({% link docs/clients/python/reference/index.md %}#duckdb.DuckDBPyConnection.register).
The precedence of objects with the same name is as follows:
- Objects explicitly registered via
DuckDBPyConnection.register()
- Native DuckDB tables and views
- [Replacement scans]({% link docs/clients/c/replacement_scans.md %})
layout: docu title: Rust Client redirect_from:
- /docs/api/rust
- /docs/api/rust/
The DuckDB Rust client can be installed from crates.io. Please see the docs.rs for details.
duckdb-rs is an ergonomic wrapper based on the DuckDB C API, please refer to the README for details.
To use duckdb, you must first initialize a Connection
handle using Connection::open()
. Connection::open()
takes as parameter the database file to read and write from. If the database file does not exist, it will be created (the file extension may be .db
, .duckdb
, or anything else). You can also use Connection::open_in_memory()
to create an in-memory database. Note that for an in-memory database no data is persisted to disk (i.e., all data is lost when you exit the process).
use duckdb::{params, Connection, Result};
let conn = Connection::open_in_memory()?;
The Connection
will automatically close the underlying db connection for you when it goes out of scope (via Drop
). You can also explicitly close the Connection
with conn.close()
. This is not much difference between these in the typical case, but in case there is an error, you'll have the chance to handle it with the explicit close.
SQL queries can be sent to DuckDB using the execute()
method of connections, or we can also prepare the statement and then query on that.
#[derive(Debug)]
struct Person {
id: i32,
name: String,
data: Option<Vec<u8>>,
}
conn.execute(
"INSERT INTO person (name, data) VALUES (?, ?)",
params![me.name, me.data],
)?;
let mut stmt = conn.prepare("SELECT id, name, data FROM person")?;
let person_iter = stmt.query_map([], |row| {
Ok(Person {
id: row.get(0)?,
name: row.get(1)?,
data: row.get(2)?,
})
})?;
for person in person_iter {
println!("Found person {:?}", person.unwrap());
}
The Rust client supports the [DuckDB Appender API]({% link docs/data/appender.md %}) for bulk inserts. For example:
fn insert_rows(conn: &Connection) -> Result<()> {
let mut app = conn.appender("foo")?;
app.append_rows([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]])?;
Ok(())
}
layout: docu title: DuckDB Wasm github_repository: https://github.com/duckdb/duckdb-wasm redirect_from:
- /docs/api/wasm
- /docs/api/wasm/
- /docs/api/wasm/overview
- /docs/api/wasm/overview/
DuckDB has been compiled to WebAssembly, so it can run inside any browser on any device.
{% include iframe.html src="https://shell.duckdb.org" %}
DuckDB-Wasm offers a layered API, it can be embedded as a JavaScript + WebAssembly library, as a Web shell, or built from source according to your needs.
A great starting point is to read the [DuckDB-Wasm launch blog post]({% post_url 2021-10-29-duckdb-wasm %})!
Another great resource is the GitHub repository.
For details, see the full DuckDB-Wasm API Documentation.
- By default, the WebAssembly client only uses a single thread.
- The WebAssembly client has a limited amount of memory available. WebAssembly limits the amount of available memory to 4 GB and browsers may impose even stricter limits.
layout: docu title: Extensions redirect_from:
- /docs/api/wasm/extensions
- /docs/api/wasm/extensions/
DuckDB-Wasm's (dynamic) extension loading is modeled after the regular DuckDB's extension loading, with a few relevant differences due to the difference in platform.
Extensions in DuckDB are binaries to be dynamically loaded via dlopen
. A cryptographical signature is appended to the binary.
An extension in DuckDB-Wasm is a regular Wasm file to be dynamically loaded via Emscripten's dlopen
. A cryptographical signature is appended to the Wasm file as a WebAssembly custom section called duckdb_signature
.
This ensures the file remains a valid WebAssembly file.
Currently, we require this custom section to be the last one, but this can be potentially relaxed in the future.
The INSTALL
semantic in native embeddings of DuckDB is to fetch, decompress from gzip
and store data in local disk.
The LOAD
semantic in native embeddings of DuckDB is to (optionally) perform signature checks and dynamic load the binary with the main DuckDB binary.
In DuckDB-Wasm, INSTALL
is a no-op given there is no durable cross-session storage. The LOAD
operation will fetch (and decompress on the fly), perform signature checks and dynamically load via the Emscripten implementation of dlopen
.
[Autoloading]({% link docs/extensions/overview.md %}), i.e., the possibility for DuckDB to add extension functionality on-the-fly, is enabled by default in DuckDB-Wasm.
Extension name | Description | Aliases |
---|---|---|
[autocomplete]({% link docs/extensions/autocomplete.md %}) | Adds support for autocomplete in the shell | |
[excel]({% link docs/extensions/excel.md %}) | Adds support for Excel-like format strings | |
[fts]({% link docs/extensions/full_text_search.md %}) | Adds support for Full-Text Search Indexes | |
[icu]({% link docs/extensions/icu.md %}) | Adds support for time zones and collations using the ICU library | |
[inet]({% link docs/extensions/inet.md %}) | Adds support for IP-related data types and functions | |
[json]({% link docs/data/json/overview.md %}) | Adds support for JSON operations | |
[parquet]({% link docs/data/parquet/overview.md %}) | Adds support for reading and writing Parquet files | |
[sqlite]({% link docs/extensions/sqlite.md %}) | Adds support for reading SQLite database files | sqlite, sqlite3 |
[sqlsmith]({% link docs/extensions/sqlsmith.md %}) | ||
[tpcds]({% link docs/extensions/tpcds.md %}) | Adds TPC-DS data generation and query support | |
[tpch]({% link docs/extensions/tpch.md %}) | Adds TPC-H data generation and query support |
WebAssembly is basically an additional platform, and there might be platform-specific limitations that make some extensions not able to match their native capabilities or to perform them in a different way. We will document here relevant differences for DuckDB-hosted extensions.
The HTTPFS extension is, at the moment, not available in DuckDB-Wasm. Https protocol capabilities needs to go through an additional layer, the browser, which adds both differences and some restrictions to what is doable from native.
Instead, DuckDB-Wasm has a separate implementation that for most purposes is interchangeable, but does not support all use cases (as it must follow security rules imposed by the browser, such as CORS). Due to this CORS restriction, any requests for data made using the HTTPFS extension must be to websites that allow (using CORS headers) the website hosting the DuckDB-Wasm instance to access that data. The MDN website is a great resource for more information regarding CORS.
As with regular DuckDB extensions, DuckDB-Wasm extension are by default checked on LOAD
to verify the signature confirm the extension has not been tampered with.
Extension signature verification can be disabled via a configuration option.
Signing is a property of the binary itself, so copying a DuckDB extension (say to serve it from a different location) will still keep a valid signature (e.g., for local development).
Official DuckDB extensions are served at extensions.duckdb.org
, and this is also the default value for the default_extension_repository
option.
When installing extensions, a relevant URL will be built that will look like extensions.duckdb.org/$duckdb_version_hash/$duckdb_platform/$name.duckdb_extension.gz
.
DuckDB-Wasm extension are fetched only on load, and the URL will look like: extensions.duckdb.org/duckdb-wasm/$duckdb_version_hash/$duckdb_platform/$name.duckdb_extension.wasm
.
Note that an additional duckdb-wasm
is added to the folder structure, and the file is served as a .wasm
file.
DuckDB-Wasm extensions are served pre-compressed using Brotli compression. While fetched from a browser, extensions will be transparently uncompressed. If you want to fetch the duckdb-wasm
extension manually, you can use curl --compress extensions.duckdb.org/<...>/icu.duckdb_extension.wasm
.
As with regular DuckDB, if you use SET custom_extension_repository = some.url.com
, subsequent loads will be attempted at some.url.com/duckdb-wasm/$duckdb_version_hash/$duckdb_platform/$name.duckdb_extension.wasm
.
Note that GET requests on the extensions needs to be CORS enabled for a browser to allow the connection.
Both DuckDB-Wasm and its extensions have been compiled using latest packaged Emscripten toolchain.
{% include iframe.html src="https://shell.duckdb.org" %}
layout: docu title: ADBC Client redirect_from:
- /docs/api/adbc
- /docs/api/adbc/
Arrow Database Connectivity (ADBC), similarly to ODBC and JDBC, is a C-style API that enables code portability between different database systems. This allows developers to effortlessly build applications that communicate with database systems without using code specific to that system. The main difference between ADBC and ODBC/JDBC is that ADBC uses Arrow to transfer data between the database system and the application. DuckDB has an ADBC driver, which takes advantage of the [zero-copy integration between DuckDB and Arrow]({% post_url 2021-12-03-duck-arrow %}) to efficiently transfer data.
DuckDB's ADBC driver currently supports version 0.7 of ADBC.
Please refer to the ADBC documentation page for a more extensive discussion on ADBC and a detailed API explanation.
The DuckDB-ADBC driver implements the full ADBC specification, with the exception of the ConnectionReadPartition
and StatementExecutePartitions
functions. Both of these functions exist to support systems that internally partition the query results, which does not apply to DuckDB.
In this section, we will describe the main functions that exist in ADBC, along with the arguments they take and provide examples for each function.
Set of functions that operate on a database.
Function name | Description | Arguments | Example |
---|---|---|---|
DatabaseNew |
Allocate a new (but uninitialized) database. | (AdbcDatabase *database, AdbcError *error) |
AdbcDatabaseNew(&adbc_database, &adbc_error) |
DatabaseSetOption |
Set a char* option. | (AdbcDatabase *database, const char *key, const char *value, AdbcError *error) |
AdbcDatabaseSetOption(&adbc_database, "path", "test.db", &adbc_error) |
DatabaseInit |
Finish setting options and initialize the database. | (AdbcDatabase *database, AdbcError *error) |
AdbcDatabaseInit(&adbc_database, &adbc_error) |
DatabaseRelease |
Destroy the database. | (AdbcDatabase *database, AdbcError *error) |
AdbcDatabaseRelease(&adbc_database, &adbc_error) |
A set of functions that create and destroy a connection to interact with a database.
Function name | Description | Arguments | Example |
---|---|---|---|
ConnectionNew |
Allocate a new (but uninitialized) connection. | (AdbcConnection*, AdbcError*) |
AdbcConnectionNew(&adbc_connection, &adbc_error) |
ConnectionSetOption |
Options may be set before ConnectionInit. | (AdbcConnection*, const char*, const char*, AdbcError*) |
AdbcConnectionSetOption(&adbc_connection, ADBC_CONNECTION_OPTION_AUTOCOMMIT, ADBC_OPTION_VALUE_DISABLED, &adbc_error) |
ConnectionInit |
Finish setting options and initialize the connection. | (AdbcConnection*, AdbcDatabase*, AdbcError*) |
AdbcConnectionInit(&adbc_connection, &adbc_database, &adbc_error) |
ConnectionRelease |
Destroy this connection. | (AdbcConnection*, AdbcError*) |
AdbcConnectionRelease(&adbc_connection, &adbc_error) |
A set of functions that retrieve metadata about the database. In general, these functions will return Arrow objects, specifically an ArrowArrayStream.
Function name | Description | Arguments | Example |
---|---|---|---|
ConnectionGetObjects |
Get a hierarchical view of all catalogs, database schemas, tables, and columns. | (AdbcConnection*, int, const char*, const char*, const char*, const char**, const char*, ArrowArrayStream*, AdbcError*) |
AdbcDatabaseInit(&adbc_database, &adbc_error) |
ConnectionGetTableSchema |
Get the Arrow schema of a table. | (AdbcConnection*, const char*, const char*, const char*, ArrowSchema*, AdbcError*) |
AdbcDatabaseRelease(&adbc_database, &adbc_error) |
ConnectionGetTableTypes |
Get a list of table types in the database. | (AdbcConnection*, ArrowArrayStream*, AdbcError*) |
AdbcDatabaseNew(&adbc_database, &adbc_error) |
A set of functions with transaction semantics for the connection. By default, all connections start with auto-commit mode on, but this can be turned off via the ConnectionSetOption function.
Function name | Description | Arguments | Example |
---|---|---|---|
ConnectionCommit |
Commit any pending transactions. | (AdbcConnection*, AdbcError*) |
AdbcConnectionCommit(&adbc_connection, &adbc_error) |
ConnectionRollback |
Rollback any pending transactions. | (AdbcConnection*, AdbcError*) |
AdbcConnectionRollback(&adbc_connection, &adbc_error) |
Statements hold state related to query execution. They represent both one-off queries and prepared statements. They can be reused; however, doing so will invalidate prior result sets from that statement.
The functions used to create, destroy, and set options for a statement:
Function name | Description | Arguments | Example |
---|---|---|---|
StatementNew |
Create a new statement for a given connection. | (AdbcConnection*, AdbcStatement*, AdbcError*) |
AdbcStatementNew(&adbc_connection, &adbc_statement, &adbc_error) |
StatementRelease |
Destroy a statement. | (AdbcStatement*, AdbcError*) |
AdbcStatementRelease(&adbc_statement, &adbc_error) |
StatementSetOption |
Set a string option on a statement. | (AdbcStatement*, const char*, const char*, AdbcError*) |
StatementSetOption(&adbc_statement, ADBC_INGEST_OPTION_TARGET_TABLE, "TABLE_NAME", &adbc_error) |
Functions related to query execution:
Function name | Description | Arguments | Example |
---|---|---|---|
StatementSetSqlQuery |
Set the SQL query to execute. The query can then be executed with StatementExecuteQuery. | (AdbcStatement*, const char*, AdbcError*) |
AdbcStatementSetSqlQuery(&adbc_statement, "SELECT * FROM TABLE", &adbc_error) |
StatementSetSubstraitPlan |
Set a substrait plan to execute. The query can then be executed with StatementExecuteQuery. | (AdbcStatement*, const uint8_t*, size_t, AdbcError*) |
AdbcStatementSetSubstraitPlan(&adbc_statement, substrait_plan, length, &adbc_error) |
StatementExecuteQuery |
Execute a statement and get the results. | (AdbcStatement*, ArrowArrayStream*, int64_t*, AdbcError*) |
AdbcStatementExecuteQuery(&adbc_statement, &arrow_stream, &rows_affected, &adbc_error) |
StatementPrepare |
Turn this statement into a prepared statement to be executed multiple times. | (AdbcStatement*, AdbcError*) |
AdbcStatementPrepare(&adbc_statement, &adbc_error) |
Functions related to binding, used for bulk insertion or in prepared statements.
Function name | Description | Arguments | Example |
---|---|---|---|
StatementBindStream |
Bind Arrow Stream. This can be used for bulk inserts or prepared statements. | (AdbcStatement*, ArrowArrayStream*, AdbcError*) |
StatementBindStream(&adbc_statement, &input_data, &adbc_error) |
Regardless of the programming language being used, there are two database options which will be required to utilize ADBC with DuckDB. The first one is the driver
, which takes a path to the DuckDB library. The second option is the entrypoint
, which is an exported function from the DuckDB-ADBC driver that initializes all the ADBC functions. Once we have configured these two options, we can optionally set the path
option, providing a path on disk to store our DuckDB database. If not set, an in-memory database is created. After configuring all the necessary options, we can proceed to initialize our database. Below is how you can do so with various different language environments.
We begin our C++ example by declaring the essential variables for querying data through ADBC. These variables include Error, Database, Connection, Statement handling, and an Arrow Stream to transfer data between DuckDB and the application.
AdbcError adbc_error;
AdbcDatabase adbc_database;
AdbcConnection adbc_connection;
AdbcStatement adbc_statement;
ArrowArrayStream arrow_stream;
We can then initialize our database variable. Before initializing the database, we need to set the driver
and entrypoint
options as mentioned above. Then we set the path
option and initialize the database. With the example below, the string "path/to/libduckdb.dylib"
should be the path to the dynamic library for DuckDB. This will be .dylib
on macOS, and .so
on Linux.
AdbcDatabaseNew(&adbc_database, &adbc_error);
AdbcDatabaseSetOption(&adbc_database, "driver", "path/to/libduckdb.dylib", &adbc_error);
AdbcDatabaseSetOption(&adbc_database, "entrypoint", "duckdb_adbc_init", &adbc_error);
// By default, we start an in-memory database, but you can optionally define a path to store it on disk.
AdbcDatabaseSetOption(&adbc_database, "path", "test.db", &adbc_error);
AdbcDatabaseInit(&adbc_database, &adbc_error);
After initializing the database, we must create and initialize a connection to it.
AdbcConnectionNew(&adbc_connection, &adbc_error);
AdbcConnectionInit(&adbc_connection, &adbc_database, &adbc_error);
We can now initialize our statement and run queries through our connection. After the AdbcStatementExecuteQuery
the arrow_stream
is populated with the result.
AdbcStatementNew(&adbc_connection, &adbc_statement, &adbc_error);
AdbcStatementSetSqlQuery(&adbc_statement, "SELECT 42", &adbc_error);
int64_t rows_affected;
AdbcStatementExecuteQuery(&adbc_statement, &arrow_stream, &rows_affected, &adbc_error);
arrow_stream.release(arrow_stream)
Besides running queries, we can also ingest data via arrow_streams
. For this we need to set an option with the table name we want to insert to, bind the stream and then execute the query.
StatementSetOption(&adbc_statement, ADBC_INGEST_OPTION_TARGET_TABLE, "AnswerToEverything", &adbc_error);
StatementBindStream(&adbc_statement, &arrow_stream, &adbc_error);
StatementExecuteQuery(&adbc_statement, nullptr, nullptr, &adbc_error);
The first thing to do is to use pip
and install the ADBC Driver manager. You will also need to install the pyarrow
to directly access Apache Arrow formatted result sets (such as using fetch_arrow_table
).
pip install adbc_driver_manager pyarrow
For details on the
adbc_driver_manager
package, see theadbc_driver_manager
package documentation.
As with C++, we need to provide initialization options consisting of the location of the libduckdb shared object and entrypoint function. Notice that the path
argument for DuckDB is passed in through the db_kwargs
dictionary.
import adbc_driver_duckdb.dbapi
with adbc_driver_duckdb.dbapi.connect("test.db") as conn, conn.cursor() as cur:
cur.execute("SELECT 42")
# fetch a pyarrow table
tbl = cur.fetch_arrow_table()
print(tbl)
Alongside fetch_arrow_table
, other methods from DBApi are also implemented on the cursor, such as fetchone
and fetchall
. Data can also be ingested via arrow_streams
. We just need to set options on the statement to bind the stream of data and execute the query.
import adbc_driver_duckdb.dbapi
import pyarrow
data = pyarrow.record_batch(
[[1, 2, 3, 4], ["a", "b", "c", "d"]],
names = ["ints", "strs"],
)
with adbc_driver_duckdb.dbapi.connect("test.db") as conn, conn.cursor() as cur:
cur.adbc_ingest("AnswerToEverything", data)
Make sure to download the libduckdb
library first (i.e., the .so
on Linux, .dylib
on Mac or .dll
on Windows) from the releases page, and put it on your LD_LIBRARY_PATH
before you run the code (but if you don't, the error will explain your options regarding the location of this file.)
The following example uses an in-memory DuckDB database to modify in-memory Arrow RecordBatches via SQL queries:
{% raw %}
package main
import (
"bytes"
"context"
"fmt"
"io"
"github.com/apache/arrow-adbc/go/adbc"
"github.com/apache/arrow-adbc/go/adbc/drivermgr"
"github.com/apache/arrow/go/v17/arrow"
"github.com/apache/arrow/go/v17/arrow/array"
"github.com/apache/arrow/go/v17/arrow/ipc"
"github.com/apache/arrow/go/v17/arrow/memory"
)
func _makeSampleArrowRecord() arrow.Record {
b := array.NewFloat64Builder(memory.DefaultAllocator)
b.AppendValues([]float64{1, 2, 3}, nil)
col := b.NewArray()
defer col.Release()
defer b.Release()
schema := arrow.NewSchema([]arrow.Field{{Name: "column1", Type: arrow.PrimitiveTypes.Float64}}, nil)
return array.NewRecord(schema, []arrow.Array{col}, int64(col.Len()))
}
type DuckDBSQLRunner struct {
ctx context.Context
conn adbc.Connection
db adbc.Database
}
func NewDuckDBSQLRunner(ctx context.Context) (*DuckDBSQLRunner, error) {
var drv drivermgr.Driver
db, err := drv.NewDatabase(map[string]string{
"driver": "duckdb",
"entrypoint": "duckdb_adbc_init",
"path": ":memory:",
})
if err != nil {
return nil, fmt.Errorf("failed to create new in-memory DuckDB database: %w", err)
}
conn, err := db.Open(ctx)
if err != nil {
return nil, fmt.Errorf("failed to open connection to new in-memory DuckDB database: %w", err)
}
return &DuckDBSQLRunner{ctx: ctx, conn: conn, db: db}, nil
}
func serializeRecord(record arrow.Record) (io.Reader, error) {
buf := new(bytes.Buffer)
wr := ipc.NewWriter(buf, ipc.WithSchema(record.Schema()))
if err := wr.Write(record); err != nil {
return nil, fmt.Errorf("failed to write record: %w", err)
}
if err := wr.Close(); err != nil {
return nil, fmt.Errorf("failed to close writer: %w", err)
}
return buf, nil
}
func (r *DuckDBSQLRunner) importRecord(sr io.Reader) error {
rdr, err := ipc.NewReader(sr)
if err != nil {
return fmt.Errorf("failed to create IPC reader: %w", err)
}
defer rdr.Release()
stmt, err := r.conn.NewStatement()
if err != nil {
return fmt.Errorf("failed to create new statement: %w", err)
}
if err := stmt.SetOption(adbc.OptionKeyIngestMode, adbc.OptionValueIngestModeCreate); err != nil {
return fmt.Errorf("failed to set ingest mode: %w", err)
}
if err := stmt.SetOption(adbc.OptionKeyIngestTargetTable, "temp_table"); err != nil {
return fmt.Errorf("failed to set ingest target table: %w", err)
}
if err := stmt.BindStream(r.ctx, rdr); err != nil {
return fmt.Errorf("failed to bind stream: %w", err)
}
if _, err := stmt.ExecuteUpdate(r.ctx); err != nil {
return fmt.Errorf("failed to execute update: %w", err)
}
return stmt.Close()
}
func (r *DuckDBSQLRunner) runSQL(sql string) ([]arrow.Record, error) {
stmt, err := r.conn.NewStatement()
if err != nil {
return nil, fmt.Errorf("failed to create new statement: %w", err)
}
defer stmt.Close()
if err := stmt.SetSqlQuery(sql); err != nil {
return nil, fmt.Errorf("failed to set SQL query: %w", err)
}
out, n, err := stmt.ExecuteQuery(r.ctx)
if err != nil {
return nil, fmt.Errorf("failed to execute query: %w", err)
}
defer out.Release()
result := make([]arrow.Record, 0, n)
for out.Next() {
rec := out.Record()
rec.Retain() // .Next() will release the record, so we need to retain it
result = append(result, rec)
}
if out.Err() != nil {
return nil, out.Err()
}
return result, nil
}
func (r *DuckDBSQLRunner) RunSQLOnRecord(record arrow.Record, sql string) ([]arrow.Record, error) {
serializedRecord, err := serializeRecord(record)
if err != nil {
return nil, fmt.Errorf("failed to serialize record: %w", err)
}
if err := r.importRecord(serializedRecord); err != nil {
return nil, fmt.Errorf("failed to import record: %w", err)
}
result, err := r.runSQL(sql)
if err != nil {
return nil, fmt.Errorf("failed to run SQL: %w", err)
}
if _, err := r.runSQL("DROP TABLE temp_table"); err != nil {
return nil, fmt.Errorf("failed to drop temp table after running query: %w", err)
}
return result, nil
}
func (r *DuckDBSQLRunner) Close() {
r.conn.Close()
r.db.Close()
}
func main() {
rec := _makeSampleArrowRecord()
fmt.Println(rec)
runner, err := NewDuckDBSQLRunner(context.Background())
if err != nil {
panic(err)
}
defer runner.Close()
resultRecords, err := runner.RunSQLOnRecord(rec, "SELECT column1+1 FROM temp_table")
if err != nil {
panic(err)
}
for _, resultRecord := range resultRecords {
fmt.Println(resultRecord)
resultRecord.Release()
}
}
{% endraw %}
Running it produces the following output:
record:
schema:
fields: 1
- column1: type=float64
rows: 3
col[0][column1]: [1 2 3]
record:
schema:
fields: 1
- (column1 + 1): type=float64, nullable
rows: 3
col[0][(column1 + 1)]: [2 3 4]
layout: docu title: Instantiation redirect_from:
- /docs/api/wasm/instantiation
- /docs/api/wasm/instantiation/
DuckDB-Wasm has multiple ways to be instantiated depending on the use case.
import * as duckdb from '@duckdb/duckdb-wasm';
const JSDELIVR_BUNDLES = duckdb.getJsDelivrBundles();
// Select a bundle based on browser checks
const bundle = await duckdb.selectBundle(JSDELIVR_BUNDLES);
const worker_url = URL.createObjectURL(
new Blob([`importScripts("${bundle.mainWorker!}");`], {type: 'text/javascript'})
);
// Instantiate the asynchronus version of DuckDB-Wasm
const worker = new Worker(worker_url);
const logger = new duckdb.ConsoleLogger();
const db = new duckdb.AsyncDuckDB(logger, worker);
await db.instantiate(bundle.mainModule, bundle.pthreadWorker);
URL.revokeObjectURL(worker_url);
import * as duckdb from '@duckdb/duckdb-wasm';
import duckdb_wasm from '@duckdb/duckdb-wasm/dist/duckdb-mvp.wasm';
import duckdb_wasm_next from '@duckdb/duckdb-wasm/dist/duckdb-eh.wasm';
const MANUAL_BUNDLES: duckdb.DuckDBBundles = {
mvp: {
mainModule: duckdb_wasm,
mainWorker: new URL('@duckdb/duckdb-wasm/dist/duckdb-browser-mvp.worker.js', import.meta.url).toString(),
},
eh: {
mainModule: duckdb_wasm_next,
mainWorker: new URL('@duckdb/duckdb-wasm/dist/duckdb-browser-eh.worker.js', import.meta.url).toString(),
},
};
// Select a bundle based on browser checks
const bundle = await duckdb.selectBundle(MANUAL_BUNDLES);
// Instantiate the asynchronus version of DuckDB-Wasm
const worker = new Worker(bundle.mainWorker!);
const logger = new duckdb.ConsoleLogger();
const db = new duckdb.AsyncDuckDB(logger, worker);
await db.instantiate(bundle.mainModule, bundle.pthreadWorker);
import * as duckdb from '@duckdb/duckdb-wasm';
import duckdb_wasm from '@duckdb/duckdb-wasm/dist/duckdb-mvp.wasm?url';
import mvp_worker from '@duckdb/duckdb-wasm/dist/duckdb-browser-mvp.worker.js?url';
import duckdb_wasm_eh from '@duckdb/duckdb-wasm/dist/duckdb-eh.wasm?url';
import eh_worker from '@duckdb/duckdb-wasm/dist/duckdb-browser-eh.worker.js?url';
const MANUAL_BUNDLES: duckdb.DuckDBBundles = {
mvp: {
mainModule: duckdb_wasm,
mainWorker: mvp_worker,
},
eh: {
mainModule: duckdb_wasm_eh,
mainWorker: eh_worker,
},
};
// Select a bundle based on browser checks
const bundle = await duckdb.selectBundle(MANUAL_BUNDLES);
// Instantiate the asynchronus version of DuckDB-wasm
const worker = new Worker(bundle.mainWorker!);
const logger = new duckdb.ConsoleLogger();
const db = new duckdb.AsyncDuckDB(logger, worker);
await db.instantiate(bundle.mainModule, bundle.pthreadWorker);
It is possible to manually download the files from https://cdn.jsdelivr.net/npm/@duckdb/duckdb-wasm/dist/.
import * as duckdb from '@duckdb/duckdb-wasm';
const MANUAL_BUNDLES: duckdb.DuckDBBundles = {
mvp: {
mainModule: 'change/me/../duckdb-mvp.wasm',
mainWorker: 'change/me/../duckdb-browser-mvp.worker.js',
},
eh: {
mainModule: 'change/m/../duckdb-eh.wasm',
mainWorker: 'change/m/../duckdb-browser-eh.worker.js',
},
};
// Select a bundle based on browser checks
const bundle = await duckdb.selectBundle(MANUAL_BUNDLES);
// Instantiate the asynchronous version of DuckDB-Wasm
const worker = new Worker(bundle.mainWorker!);
const logger = new duckdb.ConsoleLogger();
const db = new duckdb.AsyncDuckDB(logger, worker);
await db.instantiate(bundle.mainModule, bundle.pthreadWorker);
layout: docu title: Data Ingestion redirect_from:
- /docs/api/wasm/data_ingestion
- /docs/api/wasm/data_ingestion/
DuckDB-Wasm has multiple ways to import data, depending on the format of the data.
There are two steps to import data into DuckDB.
First, the data file is imported into a local file system using register functions (registerEmptyFileBuffer, registerFileBuffer, registerFileHandle, registerFileText, registerFileURL).
Then, the data file is imported into DuckDB using insert functions (insertArrowFromIPCStream, insertArrowTable, insertCSVFromPath, insertJSONFromPath) or directly using FROM SQL query (using extensions like Parquet or Wasm-flavored httpfs).
[Insert statements]({% link docs/data/insert.md %}) can also be used to import data.
// Create a new connection
const c = await db.connect();
// ... import data
// Close the connection to release memory
await c.close();
// Data can be inserted from an existing arrow.Table
// More Example https://arrow.apache.org/docs/js/
import { tableFromArrays } from 'apache-arrow';
// EOS signal according to Arrorw IPC streaming format
// See https://arrow.apache.org/docs/format/Columnar.html#ipc-streaming-format
const EOS = new Uint8Array([255, 255, 255, 255, 0, 0, 0, 0]);
const arrowTable = tableFromArrays({
id: [1, 2, 3],
name: ['John', 'Jane', 'Jack'],
age: [20, 21, 22],
});
await c.insertArrowTable(arrowTable, { name: 'arrow_table' });
// Write EOS
await c.insertArrowTable(EOS, { name: 'arrow_table' });
// ..., from a raw Arrow IPC stream
const streamResponse = await fetch(`someapi`);
const streamReader = streamResponse.body.getReader();
const streamInserts = [];
while (true) {
const { value, done } = await streamReader.read();
if (done) break;
streamInserts.push(c.insertArrowFromIPCStream(value, { name: 'streamed' }));
}
// Write EOS
streamInserts.push(c.insertArrowFromIPCStream(EOS, { name: 'streamed' }));
await Promise.all(streamInserts);
// ..., from CSV files
// (interchangeable: registerFile{Text,Buffer,URL,Handle})
const csvContent = '1|foo\n2|bar\n';
await db.registerFileText(`data.csv`, csvContent);
// ... with typed insert options
await c.insertCSVFromPath('data.csv', {
schema: 'main',
name: 'foo',
detect: false,
header: false,
delimiter: '|',
columns: {
col1: new arrow.Int32(),
col2: new arrow.Utf8(),
},
});
// ..., from JSON documents in row-major format
const jsonRowContent = [
{ "col1": 1, "col2": "foo" },
{ "col1": 2, "col2": "bar" },
];
await db.registerFileText(
'rows.json',
JSON.stringify(jsonRowContent),
);
await c.insertJSONFromPath('rows.json', { name: 'rows' });
// ... or column-major format
const jsonColContent = {
"col1": [1, 2],
"col2": ["foo", "bar"]
};
await db.registerFileText(
'columns.json',
JSON.stringify(jsonColContent),
);
await c.insertJSONFromPath('columns.json', { name: 'columns' });
// From API
const streamResponse = await fetch(`someapi/content.json`);
await db.registerFileBuffer('file.json', new Uint8Array(await streamResponse.arrayBuffer()))
await c.insertJSONFromPath('file.json', { name: 'JSONContent' });
// from Parquet files
// ...Local
const pickedFile: File = letUserPickFile();
await db.registerFileHandle('local.parquet', pickedFile, DuckDBDataProtocol.BROWSER_FILEREADER, true);
// ...Remote
await db.registerFileURL('remote.parquet', 'https://origin/remote.parquet', DuckDBDataProtocol.HTTP, false);
// ... Using Fetch
const res = await fetch('https://origin/remote.parquet');
await db.registerFileBuffer('buffer.parquet', new Uint8Array(await res.arrayBuffer()));
// ..., by specifying URLs in the SQL text
await c.query(`
CREATE TABLE direct AS
SELECT * FROM 'https://origin/remote.parquet'
`);
// ..., or by executing raw insert statements
await c.query(`
INSERT INTO existing_table
VALUES (1, 'foo'), (2, 'bar')`);
// ..., by specifying URLs in the SQL text
await c.query(`
CREATE TABLE direct AS
SELECT * FROM 'https://origin/remote.parquet'
`);
Tip If you encounter a Network Error (
Failed to execute 'send' on 'XMLHttpRequest'
) when you try to query files from S3, configure the S3 permission CORS header. For example:
[
{
"AllowedHeaders": [
"*"
],
"AllowedMethods": [
"GET",
"HEAD"
],
"AllowedOrigins": [
"*"
],
"ExposeHeaders": [],
"MaxAgeSeconds": 3000
}
]
// ..., or by executing raw insert statements
await c.query(`
INSERT INTO existing_table
VALUES (1, 'foo'), (2, 'bar')`);
layout: docu title: Query redirect_from:
- /docs/api/wasm/query
- /docs/api/wasm/query/
DuckDB-Wasm provides functions for querying data. Queries are run sequentially.
First, a connection need to be created by calling connect. Then, queries can be run by calling query or send.
// Create a new connection
const conn = await db.connect();
// Either materialize the query result
await conn.query<{ v: arrow.Int }>(`
SELECT * FROM generate_series(1, 100) t(v)
`);
// ..., or fetch the result chunks lazily
for await (const batch of await conn.send<{ v: arrow.Int }>(`
SELECT * FROM generate_series(1, 100) t(v)
`)) {
// ...
}
// Close the connection to release memory
await conn.close();
// Create a new connection
const conn = await db.connect();
// Prepare query
const stmt = await conn.prepare(`SELECT v + ? FROM generate_series(0, 10_000) t(v);`);
// ... and run the query with materialized results
await stmt.query(234);
// ... or result chunks
for await (const batch of await stmt.send(234)) {
// ...
}
// Close the statement to release memory
await stmt.close();
// Closing the connection will release statements as well
await conn.close();
// Create a new connection
const conn = await db.connect();
// Query
const arrowResult = await conn.query<{ v: arrow.Int }>(`
SELECT * FROM generate_series(1, 100) t(v)
`);
// Convert arrow table to json
const result = arrowResult.toArray().map((row) => row.toJSON());
// Close the connection to release memory
await conn.close();
// Create a new connection
const conn = await db.connect();
// Export Parquet
conn.send(`COPY (SELECT * FROM tbl) TO 'result-snappy.parquet' (FORMAT PARQUET);`);
const parquet_buffer = await this._db.copyFileToBuffer('result-snappy.parquet');
// Generate a download link
const link = URL.createObjectURL(new Blob([parquet_buffer]));
// Close the connection to release memory
await conn.close();
layout: docu title: Julia Client redirect_from:
- /docs/api/julia
- /docs/api/julia/
The DuckDB Julia package provides a high-performance front-end for DuckDB. Much like SQLite, DuckDB runs in-process within the Julia client, and provides a DBInterface front-end.
The package also supports multi-threaded execution. It uses Julia threads/tasks for this purpose. If you wish to run queries in parallel, you must launch Julia with multi-threading support (by e.g., setting the JULIA_NUM_THREADS
environment variable).
Install DuckDB as follows:
using Pkg
Pkg.add("DuckDB")
Alternatively, enter the package manager using the ]
key, and issue the following command:
pkg> add DuckDB
using DuckDB
# create a new in-memory database
con = DBInterface.connect(DuckDB.DB, ":memory:")
# create a table
DBInterface.execute(con, "CREATE TABLE integers (i INTEGER)")
# insert data by executing a prepared statement
stmt = DBInterface.prepare(con, "INSERT INTO integers VALUES(?)")
DBInterface.execute(stmt, [42])
# query the database
results = DBInterface.execute(con, "SELECT 42 a")
print(results)
Some SQL statements, such as PIVOT and IMPORT DATABASE are executed as multiple prepared statements and will error when using DuckDB.execute()
. Instead they can be run with DuckDB.query()
instead of DuckDB.execute()
and will always return a materialized result.
The DuckDB Julia package also provides support for querying Julia DataFrames. Note that the DataFrames are directly read by DuckDB – they are not inserted or copied into the database itself.
If you wish to load data from a DataFrame into a DuckDB table you can run a CREATE TABLE ... AS
or INSERT INTO
query.
using DuckDB
using DataFrames
# create a new in-memory dabase
con = DBInterface.connect(DuckDB.DB)
# create a DataFrame
df = DataFrame(a = [1, 2, 3], b = [42, 84, 42])
# register it as a view in the database
DuckDB.register_data_frame(con, df, "my_df")
# run a SQL query over the DataFrame
results = DBInterface.execute(con, "SELECT * FROM my_df")
print(results)
The DuckDB Julia package also supports the [Appender API]({% link docs/data/appender.md %}), which is much faster than using prepared statements or individual INSERT INTO
statements. Appends are made in row-wise format. For every column, an append()
call should be made, after which the row should be finished by calling flush()
. After all rows have been appended, close()
should be used to finalize the Appender and clean up the resulting memory.
using DuckDB, DataFrames, Dates
db = DuckDB.DB()
# create a table
DBInterface.execute(db,
"CREATE OR REPLACE TABLE data(id INTEGER PRIMARY KEY, value FLOAT, timestamp TIMESTAMP, date DATE)")
# create data to insert
len = 100
df = DataFrames.DataFrame(
id = collect(1:len),
value = rand(len),
timestamp = Dates.now() + Dates.Second.(1:len),
date = Dates.today() + Dates.Day.(1:len)
)
# append data by row
appender = DuckDB.Appender(db, "data")
for i in eachrow(df)
for j in i
DuckDB.append(appender, j)
end
DuckDB.end_row(appender)
end
# close the appender after all rows
DuckDB.close(appender)
Within a Julia process, tasks are able to concurrently read and write to the database, as long as each task maintains its own connection to the database. In the example below, a single task is spawned to periodically read the database and many tasks are spawned to write to the database using both [INSERT
statements]({% link docs/sql/statements/insert.md %}) as well as the [Appender API]({% link docs/data/appender.md %}).
using Dates, DataFrames, DuckDB
db = DuckDB.DB()
DBInterface.connect(db)
DBInterface.execute(db, "CREATE OR REPLACE TABLE data (date TIMESTAMP, id INTEGER)")
function run_reader(db)
# create a DuckDB connection specifically for this task
conn = DBInterface.connect(db)
while true
println(DBInterface.execute(conn,
"SELECT id, count(date) AS count, max(date) AS max_date
FROM data GROUP BY id ORDER BY id") |> DataFrames.DataFrame)
Threads.sleep(1)
end
DBInterface.close(conn)
end
# spawn one reader task
Threads.@spawn run_reader(db)
function run_inserter(db, id)
# create a DuckDB connection specifically for this task
conn = DBInterface.connect(db)
for i in 1:1000
Threads.sleep(0.01)
DuckDB.execute(conn, "INSERT INTO data VALUES (current_timestamp, ?)"; id);
end
DBInterface.close(conn)
end
# spawn many insert tasks
for i in 1:100
Threads.@spawn run_inserter(db, 1)
end
function run_appender(db, id)
# create a DuckDB connection specifically for this task
appender = DuckDB.Appender(db, "data")
for i in 1:1000
Threads.sleep(0.01)
row = (Dates.now(Dates.UTC), id)
for j in row
DuckDB.append(appender, j);
end
DuckDB.end_row(appender);
end
DuckDB.close(appender);
end
# spawn many appender tasks
for i in 1:100
Threads.@spawn run_appender(db, 2)
end
Credits to kimmolinna for the original DuckDB Julia connector.
The DuckDB CLI client supports “safe mode”. In safe mode, the CLI is prevented from accessing external files other than the database file that it was initially connected to and prevented from interacting with the host file system.
This has the following effects:
- The following [dot commands]({% link docs/clients/cli/dot_commands.md %}) are disabled:
.cd
.excel
.import
.log
.once
.open
.output
.read
.sh
.system
- Auto-complete no longer scans the file system for files to suggest as auto-complete targets.
- The [
getenv
function]({% link docs/clients/cli/overview.md %}#reading-environment-variables) is disabled. - The [
enable_external_access
option]({% link docs/configuration/overview.md %}#configuration-reference) is set tofalse
. This implies that:ATTACH
cannot attach a database from an on-disk file.COPY
cannot read to/write from files.read_csv
,read_parquet
,read_json
, etc. cannot read from disk.
layout: docu title: CLI API redirect_from:
- /docs/api/cli
- /docs/api/cli/
- /docs/clients/cli
- /docs/clients/cli/
- /docs/api/cli/overview
- /docs/api/cli/overview/
The DuckDB CLI (Command Line Interface) is a single, dependency-free executable. It is precompiled for Windows, Mac, and Linux for both the stable version and for nightly builds produced by GitHub Actions. Please see the [installation page]({% link docs/installation/index.html %}) under the CLI tab for download links.
The DuckDB CLI is based on the SQLite command line shell, so CLI-client-specific functionality is similar to what is described in the SQLite documentation (although DuckDB's SQL syntax follows PostgreSQL conventions with a [few exceptions]({% link docs/sql/dialect/postgresql_compatibility.md %})).
DuckDB has a tldr page, which summarizes the most common uses of the CLI client. If you have tldr installed, you can display it by running
tldr duckdb
.
Once the CLI executable has been downloaded, unzip it and save it to any directory.
Navigate to that directory in a terminal and enter the command duckdb
to run the executable.
If in a PowerShell or POSIX shell environment, use the command ./duckdb
instead.
The typical usage of the duckdb
command is the following:
duckdb [OPTIONS] [FILENAME]
The [OPTIONS]
part encodes [arguments for the CLI client]({% link docs/clients/cli/arguments.md %}). Common options include:
-csv
: sets the output mode to CSV-json
: sets the output mode to JSON-readonly
: open the database in read-only mode (see [concurrency in DuckDB]({% link docs/connect/concurrency.md %}#handling-concurrency))
For a full list of options, see the [command line arguments page]({% link docs/clients/cli/arguments.md %}).
When no [FILENAME]
argument is provided, the DuckDB CLI will open a temporary [in-memory database]({% link docs/connect/overview.md %}#in-memory-database).
You will see DuckDB's version number, the information on the connection and a prompt starting with a D
.
duckdb
v{{ site.currentduckdbversion }} {{ site.currentduckdbhash }}
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D
To open or create a [persistent database]({% link docs/connect/overview.md %}#persistent-database), simply include a path as a command line argument:
duckdb my_database.duckdb
Once the CLI has been opened, enter a SQL statement followed by a semicolon, then hit enter and it will be executed. Results will be displayed in a table in the terminal. If a semicolon is omitted, hitting enter will allow for multi-line SQL statements to be entered.
SELECT 'quack' AS my_column;
my_column |
---|
quack |
The CLI supports all of DuckDB's rich [SQL syntax]({% link docs/sql/introduction.md %}) including SELECT
, CREATE
, and ALTER
statements.
The CLI supports [autocompletion]({% link docs/clients/cli/autocomplete.md %}), and has sophisticated [editor features]({% link docs/clients/cli/editing.md %}) and [syntax highlighting]({% link docs/clients/cli/syntax_highlighting.md %}) on certain platforms.
To exit the CLI, press Ctrl
+D
if your platform supports it. Otherwise, press Ctrl
+C
or use the .exit
command. If used a persistent database, DuckDB will automatically checkpoint (save the latest edits to disk) and close. This will remove the .wal
file (the write-ahead log) and consolidate all of your data into the single-file database.
In addition to SQL syntax, special [dot commands]({% link docs/clients/cli/dot_commands.md %}) may be entered into the CLI client. To use one of these commands, begin the line with a period (.
) immediately followed by the name of the command you wish to execute. Additional arguments to the command are entered, space separated, after the command. If an argument must contain a space, either single or double quotes may be used to wrap that parameter. Dot commands must be entered on a single line, and no whitespace may occur before the period. No semicolon is required at the end of the line.
Frequently-used configurations can be stored in the file ~/.duckdbrc
, which will be loaded when starting the CLI client. See the Configuring the CLI section below for further information on these options.
Tip To prevent the DuckDB CLI client from reading the
~/.duckdbrc
file, start it as follows:duckdb -init /dev/null
Below, we summarize a few important dot commands. To see all available commands, see the [dot commands page]({% link docs/clients/cli/dot_commands.md %}) or use the .help
command.
In addition to connecting to a database when opening the CLI, a new database connection can be made by using the .open
command. If no additional parameters are supplied, a new in-memory database connection is created. This database will not be persisted when the CLI connection is closed.
.open
The .open
command optionally accepts several options, but the final parameter can be used to indicate a path to a persistent database (or where one should be created). The special string :memory:
can also be used to open a temporary in-memory database.
.open persistent.duckdb
Warning
.open
closes the current database. To keep the current database, while adding a new database, use the [ATTACH
statement]({% link docs/sql/statements/attach.md %}).
One important option accepted by .open
is the --readonly
flag. This disallows any editing of the database. To open in read only mode, the database must already exist. This also means that a new in-memory database can't be opened in read only mode since in-memory databases are created upon connection.
.open --readonly preexisting.duckdb
The .mode
[dot command]({% link docs/clients/cli/dot_commands.md %}#mode) may be used to change the appearance of the tables returned in the terminal output.
These include the default duckbox
mode, csv
and json
mode for ingestion by other tools, markdown
and latex
for documents, and insert
mode for generating SQL statements.
By default, the DuckDB CLI sends results to the terminal's standard output. However, this can be modified using either the .output
or .once
commands.
For details, see the documentation for the [output dot command]({% link docs/clients/cli/dot_commands.md %}#output-writing-results-to-a-file).
The DuckDB CLI can read both SQL commands and dot commands from an external file instead of the terminal using the .read
command. This allows for a number of commands to be run in sequence and allows command sequences to be saved and reused.
The .read
command requires only one argument: the path to the file containing the SQL and/or commands to execute. After running the commands in the file, control will revert back to the terminal. Output from the execution of that file is governed by the same .output
and .once
commands that have been discussed previously. This allows the output to be displayed back to the terminal, as in the first example below, or out to another file, as in the second example.
In this example, the file select_example.sql
is located in the same directory as duckdb.exe and contains the following SQL statement:
SELECT *
FROM generate_series(5);
To execute it from the CLI, the .read
command is used.
.read select_example.sql
The output below is returned to the terminal by default. The formatting of the table can be adjusted using the .output
or .once
commands.
| generate_series |
|----------------:|
| 0 |
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
Multiple commands, including both SQL and dot commands, can also be run in a single .read
command. In this example, the file write_markdown_to_file.sql
is located in the same directory as duckdb.exe and contains the following commands:
.mode markdown
.output series.md
SELECT *
FROM generate_series(5);
To execute it from the CLI, the .read
command is used as before.
.read write_markdown_to_file.sql
In this case, no output is returned to the terminal. Instead, the file series.md
is created (or replaced if it already existed) with the markdown-formatted results shown here:
| generate_series |
|----------------:|
| 0 |
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
Several dot commands can be used to configure the CLI.
On startup, the CLI reads and executes all commands in the file ~/.duckdbrc
, including dot commands and SQL statements.
This allows you to store the configuration state of the CLI.
You may also point to a different initialization file using the -init
.
As an example, a file in the same directory as the DuckDB CLI named prompt.sql
will change the DuckDB prompt to be a duck head and run a SQL statement.
Note that the duck head is built with Unicode characters and does not work in all terminal environments (e.g., in Windows, unless running with WSL and using the Windows Terminal).
.prompt '⚫◗ '
To invoke that file on initialization, use this command:
duckdb -init prompt.sql
This outputs:
-- Loading resources from prompt.sql
v⟨version⟩ ⟨git hash⟩
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
⚫◗
To read/process a file and exit immediately, pipe the file contents in to duckdb
:
duckdb < select_example.sql
To execute a command with SQL text passed in directly from the command line, call duckdb
with two arguments: the database location (or :memory:
), and a string with the SQL statement to execute.
duckdb :memory: "SELECT 42 AS the_answer"
To load extensions, use DuckDB's SQL INSTALL
and LOAD
commands as you would other SQL statements.
INSTALL fts;
LOAD fts;
For details, see the [Extension docs]({% link docs/extensions/overview.md %}).
When in a Unix environment, it can be useful to pipe data between multiple commands.
DuckDB is able to read data from stdin as well as write to stdout using the file location of stdin (/dev/stdin
) and stdout (/dev/stdout
) within SQL commands, as pipes act very similarly to file handles.
This command will create an example CSV:
COPY (SELECT 42 AS woot UNION ALL SELECT 43 AS woot) TO 'test.csv' (HEADER);
First, read a file and pipe it to the duckdb
CLI executable. As arguments to the DuckDB CLI, pass in the location of the database to open, in this case, an in-memory database, and a SQL command that utilizes /dev/stdin
as a file location.
cat test.csv | duckdb -c "SELECT * FROM read_csv('/dev/stdin')"
woot |
---|
42 |
43 |
To write back to stdout, the copy command can be used with the /dev/stdout
file location.
cat test.csv | \
duckdb -c "COPY (SELECT * FROM read_csv('/dev/stdin')) TO '/dev/stdout' WITH (FORMAT 'csv', HEADER)"
woot
42
43
The getenv
function can read environment variables.
To retrieve the home directory's path from the HOME
environment variable, use:
SELECT getenv('HOME') AS home;
home |
---|
/Users/user_name |
The output of the getenv
function can be used to set [configuration options]({% link docs/configuration/overview.md %}). For example, to set the NULL
order based on the environment variable DEFAULT_NULL_ORDER
, use:
SET default_null_order = getenv('DEFAULT_NULL_ORDER');
The getenv
function can only be run when the [enable_external_access
]({% link docs/configuration/overview.md %}#configuration-reference) is set to true
(the default setting).
It is only available in the CLI client and is not supported in other DuckDB clients.
The DuckDB CLI supports executing [prepared statements]({% link docs/sql/query_syntax/prepared_statements.md %}) in addition to regular SELECT
statements.
To create and execute a prepared statement in the CLI client, use the PREPARE
clause and the EXECUTE
statement.
layout: docu title: Autocomplete redirect_from:
- /docs/api/cli/autocomplete
- /docs/api/cli/autocomplete/
The shell offers context-aware autocomplete of SQL queries through the [autocomplete
extension]({% link docs/extensions/autocomplete.md %}). autocomplete is triggered by pressing Tab
.
Multiple autocomplete suggestions can be present. You can cycle forwards through the suggestions by repeatedly pressing Tab
, or Shift+Tab
to cycle backwards. autocompletion can be reverted by pressing ESC
twice.
The shell autocompletes four different groups:
- Keywords
- Table names and table functions
- Column names and scalar functions
- File names
The shell looks at the position in the SQL statement to determine which of these autocompletions to trigger. For example:
SELECT s
student_id
SELECT student_id F
FROM
SELECT student_id FROM g
grades
SELECT student_id FROM 'd
'data/
SELECT student_id FROM 'data/
'data/grades.csv
layout: docu title: Syntax Highlighting redirect_from:
- /docs/api/cli/syntax_highlighting
- /docs/api/cli/syntax_highlighting/
Syntax highlighting in the CLI is currently only available for macOS and Linux.
SQL queries that are written in the shell are automatically highlighted using syntax highlighting.
There are several components of a query that are highlighted in different colors. The colors can be configured using [dot commands]({% link docs/clients/cli/dot_commands.md %}).
Syntax highlighting can also be disabled entirely using the .highlight off
command.
Below is a list of components that can be configured.
Type | Command | Default color |
---|---|---|
Keywords | .keyword |
green |
Constants ad literals | .constant |
yellow |
Comments | .comment |
brightblack |
Errors | .error |
red |
Continuation | .cont |
brightblack |
Continuation (Selected) | .cont_sel |
green |
The components can be configured using either a supported color name (e.g., .keyword red
), or by directly providing a terminal code to use for rendering (e.g., .keywordcode \033[31m
). Below is a list of supported color names and their corresponding terminal codes.
Color | Terminal code |
---|---|
red | \033[31m |
green | \033[32m |
yellow | \033[33m |
blue | \033[34m |
magenta | \033[35m |
cyan | \033[36m |
white | \033[37m |
brightblack | \033[90m |
brightred | \033[91m |
brightgreen | \033[92m |
brightyellow | \033[93m |
brightblue | \033[94m |
brightmagenta | \033[95m |
brightcyan | \033[96m |
brightwhite | \033[97m |
For example, here is an alternative set of syntax highlighting colors:
.keyword brightred
.constant brightwhite
.comment cyan
.error yellow
.cont blue
.cont_sel brightblue
If you wish to start up the CLI with a different set of colors every time, you can place these commands in the ~/.duckdbrc
file that is loaded on start-up of the CLI.
The shell has support for highlighting certain errors. In particular, mismatched brackets and unclosed quotes are highlighted in red (or another color if specified). This highlighting is automatically disabled for large queries. In addition, it can be disabled manually using the .render_errors off
command.
layout: docu title: Editing redirect_from:
- /docs/api/cli/editing
- /docs/api/cli/editing/
The linenoise-based CLI editor is currently only available for macOS and Linux.
DuckDB's CLI uses a line-editing library based on linenoise, which has shortcuts that are based on Emacs mode of readline. Below is a list of available commands.
Key | Action |
---|---|
Left |
Move back a character |
Right |
Move forward a character |
Up |
Move up a line. When on the first line, move to previous history entry |
Down |
Move down a line. When on last line, move to next history entry |
Home |
Move to beginning of buffer |
End |
Move to end of buffer |
Ctrl +Left |
Move back a word |
Ctrl +Right |
Move forward a word |
Ctrl +A |
Move to beginning of buffer |
Ctrl +B |
Move back a character |
Ctrl +E |
Move to end of buffer |
Ctrl +F |
Move forward a character |
Alt +Left |
Move back a word |
Alt +Right |
Move forward a word |
Key | Action |
---|---|
Ctrl +P |
Move to previous history entry |
Ctrl +N |
Move to next history entry |
Ctrl +R |
Search the history |
Ctrl +S |
Search the history |
Alt +< |
Move to first history entry |
Alt +> |
Move to last history entry |
Alt +N |
Search the history |
Alt +P |
Search the history |
Key | Action |
---|---|
Backspace |
Delete previous character |
Delete |
Delete next character |
Ctrl +D |
Delete next character. When buffer is empty, end editing |
Ctrl +H |
Delete previous character |
Ctrl +K |
Delete everything after the cursor |
Ctrl +T |
Swap current and next character |
Ctrl +U |
Delete all text |
Ctrl +W |
Delete previous word |
Alt +C |
Convert next word to titlecase |
Alt +D |
Delete next word |
Alt +L |
Convert next word to lowercase |
Alt +R |
Delete all text |
Alt +T |
Swap current and next word |
Alt +U |
Convert next word to uppercase |
Alt +Backspace |
Delete previous word |
Alt +\ |
Delete spaces around cursor |
Key | Action |
---|---|
Tab |
Autocomplete. When autocompleting, cycle to next entry |
Shift +Tab |
When autocompleting, cycle to previous entry |
Esc +Esc |
When autocompleting, revert autocompletion |
Key | Action |
---|---|
Enter |
Execute query. If query is not complete, insert a newline at the end of the buffer |
Ctrl +J |
Execute query. If query is not complete, insert a newline at the end of the buffer |
Ctrl +C |
Cancel editing of current query |
Ctrl +G |
Cancel editing of current query |
Ctrl +L |
Clear screen |
Ctrl +O |
Cancel editing of current query |
Ctrl +X |
Insert a newline after the cursor |
Ctrl +Z |
Suspend CLI and return to shell, use fg to re-open |
If you prefer, you can use rlwrap
to use read-line directly with the shell. Then, use Shift
+Enter
to insert a newline and Enter
to execute the query:
rlwrap --substitute-prompt="D " duckdb -batch
layout: docu title: Output Formats redirect_from:
- /docs/api/cli/output-formats
- /docs/api/cli/output-formats/
- /docs/api/cli/output_formats
- /docs/api/cli/output_formats/
The .mode
[dot command]({% link docs/clients/cli/dot_commands.md %}) may be used to change the appearance of the tables returned in the terminal output. In addition to customizing the appearance, these modes have additional benefits. This can be useful for presenting DuckDB output elsewhere by redirecting the terminal [output to a file]({% link docs/clients/cli/dot_commands.md %}#output-writing-results-to-a-file). Using the insert
mode will build a series of SQL statements that can be used to insert the data at a later point.
The markdown
mode is particularly useful for building documentation and the latex
mode is useful for writing academic papers.
Mode | Description |
---|---|
ascii |
Columns/rows delimited by 0x1F and 0x1E |
box |
Tables using unicode box-drawing characters |
csv |
Comma-separated values |
column |
Output in columns (See .width ) |
duckbox |
Tables with extensive features (default) |
html |
HTML <table> code |
insert |
SQL insert statements for TABLE |
json |
Results in a JSON array |
jsonlines |
Results in a NDJSON |
latex |
LaTeX tabular environment code |
line |
One value per line |
list |
Values delimited by "|" |
markdown |
Markdown table format |
quote |
Escape answers as for SQL |
table |
ASCII-art table |
tabs |
Tab-separated values |
tcl |
TCL list elements |
trash |
No output |
Use .mode
directly to query the appearance currently in use.
.mode
current output mode: duckbox
.mode markdown
SELECT 'quacking intensifies' AS incoming_ducks;
| incoming_ducks |
|----------------------|
| quacking intensifies |
The output appearance can also be adjusted with the .separator
command. If using an export mode that relies on a separator (csv
or tabs
for example), the separator will be reset when the mode is changed. For example, .mode csv
will set the separator to a comma (,
). Using .separator "|"
will then convert the output to be pipe-separated.
.mode csv
SELECT 1 AS col_1, 2 AS col_2
UNION ALL
SELECT 10 AS col1, 20 AS col_2;
col_1,col_2
1,2
10,20
.separator "|"
SELECT 1 AS col_1, 2 AS col_2
UNION ALL
SELECT 10 AS col1, 20 AS col_2;
col_1|col_2
1|2
10|20
layout: docu title: Command Line Arguments redirect_from:
- /docs/cli/arguments
- /docs/cli/arguments/
The table below summarizes DuckDB's command line options. To list all command line options, use the command:
duckdb -help
For a list of dot commands available in the CLI shell, see the [Dot Commands page]({% link docs/clients/cli/dot_commands.md %}).
Argument | Description |
---|---|
-append |
Append the database to the end of the file |
-ascii |
Set [output mode]({% link docs/clients/cli/output_formats.md %}) to ascii |
-bail |
Stop after hitting an error |
-batch |
Force batch I/O |
-box |
Set [output mode]({% link docs/clients/cli/output_formats.md %}) to box |
-column |
Set [output mode]({% link docs/clients/cli/output_formats.md %}) to column |
-cmd COMMAND |
Run COMMAND before reading stdin |
-c COMMAND |
Run COMMAND and exit |
-csv |
Set [output mode]({% link docs/clients/cli/output_formats.md %}) to csv |
-echo |
Print commands before execution |
-f FILENAME |
Run the script in FILENAME and exit. Note that the ~/.duckdbrc is read and executed first (if it exists) |
-init FILENAME |
Run the script in FILENAME upon startup (instead of ~/.duckdbrc ) |
-header |
Turn headers on |
-help |
Show this message |
-html |
Set [output mode]({% link docs/clients/cli/output_formats.md %}) to HTML |
-interactive |
Force interactive I/O |
-json |
Set [output mode]({% link docs/clients/cli/output_formats.md %}) to json |
-line |
Set [output mode]({% link docs/clients/cli/output_formats.md %}) to line |
-list |
Set [output mode]({% link docs/clients/cli/output_formats.md %}) to list |
-markdown |
Set [output mode]({% link docs/clients/cli/output_formats.md %}) to markdown |
-newline SEP |
Set output row separator. Default: \n |
-nofollow |
Refuse to open symbolic links to database files |
-noheader |
Turn headers off |
-no-stdin |
Exit after processing options instead of reading stdin |
-nullvalue TEXT |
Set text string for NULL values. Default: empty string |
-quote |
Set [output mode]({% link docs/clients/cli/output_formats.md %}) to quote |
-readonly |
Open the database read-only |
-s COMMAND |
Run COMMAND and exit |
-separator SEP |
Set output column separator to SEP . Default: ` |
-stats |
Print memory stats before each finalize |
-table |
Set [output mode]({% link docs/clients/cli/output_formats.md %}) to table |
-unsigned |
Allow loading of [unsigned extensions]({% link docs/extensions/overview.md %}#unsigned-extensions). This option is intended to be used for developing extensions. Consult the [Securing DuckDB page]({% link docs/operations_manual/securing_duckdb/securing_extensions.md %}) for guidelines on how set up DuckDB in a secure manner |
-version |
Show DuckDB version |
Note that the CLI arguments are processed in order, similarly to the behavior of the SQLite CLI. For example:
duckdb -csv -c 'SELECT 42 AS hello' -json -c 'SELECT 84 AS world'
Returns the following:
hello
42
[{"world":84}]
layout: docu title: Dot Commands redirect_from:
- /docs/api/cli/dot-commands
- /docs/api/cli/dot-commands/
- /docs/api/cli/dot_commands
- /docs/api/cli/dot_commands/
Dot commands are available in the DuckDB CLI client. To use one of these commands, begin the line with a period (.
) immediately followed by the name of the command you wish to execute. Additional arguments to the command are entered, space separated, after the command. If an argument must contain a space, either single or double quotes may be used to wrap that parameter. Dot commands must be entered on a single line, and no whitespace may occur before the period. No semicolon is required at the end of the line. To see available commands, use the .help
command.
Command | Description |
---|---|
`.bail on | off` |
`.binary on | off` |
.cd DIRECTORY |
Change the working directory to DIRECTORY |
`.changes on | off` |
.check GLOB |
Fail if output since .testcase does not match |
.columns |
Column-wise rendering of query results |
.constant ?COLOR? |
Sets the syntax highlighting color used for constant values |
.constantcode ?CODE? |
Sets the syntax highlighting terminal code used for constant values |
.databases |
List names and files of attached databases |
`.echo on | off` |
.excel |
Display the output of next command in spreadsheet |
.exit ?CODE? |
Exit this program with return-code CODE |
`.explain ?on | off |
.fullschema ?--indent? |
Show schema and the content of sqlite_stat tables |
`.headers on | off` |
.help ?-all? ?PATTERN? |
Show help text for PATTERN |
`.highlight [on | off]` |
.import FILE TABLE |
Import data from FILE into TABLE |
.indexes ?TABLE? |
Show names of indexes |
.keyword ?COLOR? |
Sets the syntax highlighting color used for keywords |
.keywordcode ?CODE? |
Sets the syntax highlighting terminal code used for keywords |
`.large_number_rendering all | footer |
.lint OPTIONS |
Report potential schema issues |
`.log FILE | off` |
.maxrows COUNT |
Sets the maximum number of rows for display. Only for [duckbox mode]({% link docs/clients/cli/output_formats.md %}) |
.maxwidth COUNT |
Sets the maximum width in characters. 0 defaults to terminal width. Only for [duckbox mode]({% link docs/clients/cli/output_formats.md %}) |
.mode MODE ?TABLE? |
Set [output mode]({% link docs/clients/cli/output_formats.md %}) |
.multiline |
Set multi-line mode (default) |
.nullvalue STRING |
Use STRING in place of NULL values |
.once ?OPTIONS? ?FILE? |
Output for the next SQL command only to FILE |
.open ?OPTIONS? ?FILE? |
Close existing database and reopen FILE |
.output ?FILE? |
Send output to FILE or stdout if FILE is omitted |
.parameter CMD ... |
Manage SQL parameter bindings |
.print STRING... |
Print literal STRING |
.prompt MAIN CONTINUE |
Replace the standard prompts |
.quit |
Exit this program |
.read FILE |
Read input from FILE |
.rows |
Row-wise rendering of query results (default) |
.safe_mode |
Activates [safe mode]({% link docs/clients/cli/safe_mode.md %}) |
.schema ?PATTERN? |
Show the CREATE statements matching PATTERN |
.separator COL ?ROW? |
Change the column and row separators |
.shell CMD ARGS... |
Run CMD ARGS... in a system shell |
.show |
Show the current values for various settings |
.singleline |
Set single-line mode |
.system CMD ARGS... |
Run CMD ARGS... in a system shell |
.tables ?TABLE? |
List names of tables [matching LIKE pattern]({% link docs/sql/functions/pattern_matching.md %}) TABLE |
.testcase NAME |
Begin redirecting output to NAME |
`.timer on | off` |
.width NUM1 NUM2 ... |
Set minimum column widths for columnar output |
The .help
text may be filtered by passing in a text string as the second argument.
.help m
.maxrows COUNT Sets the maximum number of rows for display (default: 40). Only for duckbox mode.
.maxwidth COUNT Sets the maximum width in characters. 0 defaults to terminal width. Only for duckbox mode.
.mode MODE ?TABLE? Set output mode
By default, the DuckDB CLI sends results to the terminal's standard output. However, this can be modified using either the .output
or .once
commands. Pass in the desired output file location as a parameter. The .once
command will only output the next set of results and then revert to standard out, but .output
will redirect all subsequent output to that file location. Note that each result will overwrite the entire file at that destination. To revert back to standard output, enter .output
with no file parameter.
In this example, the output format is changed to markdown
, the destination is identified as a Markdown file, and then DuckDB will write the output of the SQL statement to that file. Output is then reverted to standard output using .output
with no parameter.
.mode markdown
.output my_results.md
SELECT 'taking flight' AS output_column;
.output
SELECT 'back to the terminal' AS displayed_column;
The file my_results.md
will then contain:
| output_column |
|---------------|
| taking flight |
The terminal will then display:
| displayed_column |
|----------------------|
| back to the terminal |
A common output format is CSV, or comma separated values. DuckDB supports [SQL syntax to export data as CSV or Parquet]({% link docs/sql/statements/copy.md %}#copy-to), but the CLI-specific commands may be used to write a CSV instead if desired.
.mode csv
.once my_output_file.csv
SELECT 1 AS col_1, 2 AS col_2
UNION ALL
SELECT 10 AS col1, 20 AS col_2;
The file my_output_file.csv
will then contain:
col_1,col_2
1,2
10,20
By passing special options (flags) to the .once
command, query results can also be sent to a temporary file and automatically opened in the user's default program. Use either the -e
flag for a text file (opened in the default text editor), or the -x
flag for a CSV file (opened in the default spreadsheet editor). This is useful for more detailed inspection of query results, especially if there is a relatively large result set. The .excel
command is equivalent to .once -x
.
.once -e
SELECT 'quack' AS hello;
The results then open in the default text file editor of the system, for example:
Tip macOS users can copy the results to their clipboards using
pbcopy
by using.once
to output topbcopy
via a pipe:.once |pbcopy
Combining this with the
.headers off
and.mode lines
options can be particularly effective.
All DuckDB clients support [querying the database schema with SQL]({% link docs/sql/meta/information_schema.md %}), but the CLI has additional [dot commands]({% link docs/clients/cli/dot_commands.md %}) that can make it easier to understand the contents of a database.
The .tables
command will return a list of tables in the database. It has an optional argument that will filter the results according to a [LIKE
pattern]({% link docs/sql/functions/pattern_matching.md %}#like).
CREATE TABLE swimmers AS SELECT 'duck' AS animal;
CREATE TABLE fliers AS SELECT 'duck' AS animal;
CREATE TABLE walkers AS SELECT 'duck' AS animal;
.tables
fliers swimmers walkers
For example, to filter to only tables that contain an l
, use the LIKE
pattern %l%
.
.tables %l%
fliers walkers
The .schema
command will show all of the SQL statements used to define the schema of the database.
.schema
CREATE TABLE fliers (animal VARCHAR);
CREATE TABLE swimmers (animal VARCHAR);
CREATE TABLE walkers (animal VARCHAR);
By default the shell includes support for syntax highlighting. The CLI's syntax highlighter can be configured using the following commands.
To turn off the highlighter:
.highlight off
To turn on the highlighter:
.highlight on
To configure the color used to highlight constants:
.constant [red|green|yellow|blue|magenta|cyan|white|brightblack|brightred|brightgreen|brightyellow|brightblue|brightmagenta|brightcyan|brightwhite]
.constantcode [terminal_code]
To configure the color used to highlight keywords:
.keyword [red|green|yellow|blue|magenta|cyan|white|brightblack|brightred|brightgreen|brightyellow|brightblue|brightmagenta|brightcyan|brightwhite]
.keywordcode [terminal_code]
DuckDB's CLI allows using shorthands for dot commands. Once a sequence of characters can unambiguously completed to a dot command or an argument, the CLI (silently) autocompletes them. For example:
.mo ma
Is equivalent to:
.mode markdown
Tip Avoid using shorthands in SQL scripts to improve readability and ensure that the scripts and futureproof.
Deprecated This feature is only included for compatibility reasons and may be removed in the future. Use the [
read_csv
function or theCOPY
statement]({% link docs/data/csv/overview.md %}) to load CSV files.
DuckDB supports [SQL syntax to directly query or import CSV files]({% link docs/data/csv/overview.md %}), but the CLI-specific commands may be used to import a CSV instead if desired. The .import
command takes two arguments and also supports several options. The first argument is the path to the CSV file, and the second is the name of the DuckDB table to create. Since DuckDB requires stricter typing than SQLite (upon which the DuckDB CLI is based), the destination table must be created before using the .import
command. To automatically detect the schema and create a table from a CSV, see the [read_csv
examples in the import docs]({% link docs/data/csv/overview.md %}).
In this example, a CSV file is generated by changing to CSV mode and setting an output file location:
.mode csv
.output import_example.csv
SELECT 1 AS col_1, 2 AS col_2 UNION ALL SELECT 10 AS col1, 20 AS col_2;
Now that the CSV has been written, a table can be created with the desired schema and the CSV can be imported. The output is reset to the terminal to avoid continuing to edit the output file specified above. The --skip N
option is used to ignore the first row of data since it is a header row and the table has already been created with the correct column names.
.mode csv
.output
CREATE TABLE test_table (col_1 INTEGER, col_2 INTEGER);
.import import_example.csv test_table --skip 1
Note that the .import
command utilizes the current .mode
and .separator
settings when identifying the structure of the data to import. The --csv
option can be used to override that behavior.
.import import_example.csv test_table --skip 1 --csv
layout: docu title: R Client github_repository: https://github.com/duckdb/duckdb-r redirect_from:
- /docs/api/r
- /docs/api/r/
The DuckDB R client can be installed using the following command:
install.packages("duckdb")
Please see the [installation page]({% link docs/installation/index.html %}?environment=r) for details.
DuckDB offers a dplyr-compatible API via the duckplyr
package. It can be installed using install.packages("duckplyr")
. For details, see the duckplyr
documentation.
The reference manual for the DuckDB R client is available at r.duckdb.org.
The standard DuckDB R client implements the DBI interface for R. If you are not familiar with DBI yet, see the Using DBI page for an introduction.
To use DuckDB, you must first create a connection object that represents the database. The connection object takes as parameter the database file to read and write from. If the database file does not exist, it will be created (the file extension may be .db
, .duckdb
, or anything else). The special value :memory:
(the default) can be used to create an in-memory database. Note that for an in-memory database no data is persisted to disk (i.e., all data is lost when you exit the R process). If you would like to connect to an existing database in read-only mode, set the read_only
flag to TRUE
. Read-only mode is required if multiple R processes want to access the same database file at the same time.
library("duckdb")
# to start an in-memory database
con <- dbConnect(duckdb())
# or
con <- dbConnect(duckdb(), dbdir = ":memory:")
# to use a database file (not shared between processes)
con <- dbConnect(duckdb(), dbdir = "my-db.duckdb", read_only = FALSE)
# to use a database file (shared between processes)
con <- dbConnect(duckdb(), dbdir = "my-db.duckdb", read_only = TRUE)
Connections are closed implicitly when they go out of scope or if they are explicitly closed using dbDisconnect()
. To shut down the database instance associated with the connection, use dbDisconnect(con, shutdown = TRUE)
DuckDB supports the standard DBI methods to send queries and retrieve result sets. dbExecute()
is meant for queries where no results are expected like CREATE TABLE
or UPDATE
etc. and dbGetQuery()
is meant to be used for queries that produce results (e.g., SELECT
). Below an example.
# create a table
dbExecute(con, "CREATE TABLE items (item VARCHAR, value DECIMAL(10, 2), count INTEGER)")
# insert two items into the table
dbExecute(con, "INSERT INTO items VALUES ('jeans', 20.0, 1), ('hammer', 42.2, 2)")
# retrieve the items again
res <- dbGetQuery(con, "SELECT * FROM items")
print(res)
# item value count
# 1 jeans 20.0 1
# 2 hammer 42.2 2
DuckDB also supports prepared statements in the R client with the dbExecute
and dbGetQuery
methods. Here is an example:
# prepared statement parameters are given as a list
dbExecute(con, "INSERT INTO items VALUES (?, ?, ?)", list('laptop', 2000, 1))
# if you want to reuse a prepared statement multiple times, use dbSendStatement() and dbBind()
stmt <- dbSendStatement(con, "INSERT INTO items VALUES (?, ?, ?)")
dbBind(stmt, list('iphone', 300, 2))
dbBind(stmt, list('android', 3.5, 1))
dbClearResult(stmt)
# query the database using a prepared statement
res <- dbGetQuery(con, "SELECT item FROM items WHERE value > ?", list(400))
print(res)
# item
# 1 laptop
Warning Do not use prepared statements to insert large amounts of data into DuckDB. See below for better options.
To write a R data frame into DuckDB, use the standard DBI function dbWriteTable()
. This creates a table in DuckDB and populates it with the data frame contents. For example:
dbWriteTable(con, "iris_table", iris)
res <- dbGetQuery(con, "SELECT * FROM iris_table LIMIT 1")
print(res)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1 5.1 3.5 1.4 0.2 setosa
It is also possible to “register” a R data frame as a virtual table, comparable to a SQL VIEW
. This does not actually transfer data into DuckDB yet. Below is an example:
duckdb_register(con, "iris_view", iris)
res <- dbGetQuery(con, "SELECT * FROM iris_view LIMIT 1")
print(res)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1 5.1 3.5 1.4 0.2 setosa
DuckDB keeps a reference to the R data frame after registration. This prevents the data frame from being garbage-collected. The reference is cleared when the connection is closed, but can also be cleared manually using the
duckdb_unregister()
method.
Also refer to the [data import documentation]({% link docs/data/overview.md %}) for more options of efficiently importing data.
DuckDB also plays well with the dbplyr / dplyr packages for programmatic query construction from R. Here is an example:
library("duckdb")
library("dplyr")
con <- dbConnect(duckdb())
duckdb_register(con, "flights", nycflights13::flights)
tbl(con, "flights") |>
group_by(dest) |>
summarise(delay = mean(dep_time, na.rm = TRUE)) |>
collect()
When using dbplyr, CSV and Parquet files can be read using the dplyr::tbl
function.
# Establish a CSV for the sake of this example
write.csv(mtcars, "mtcars.csv")
# Summarize the dataset in DuckDB to avoid reading the entire CSV into R's memory
tbl(con, "mtcars.csv") |>
group_by(cyl) |>
summarise(across(disp:wt, .fns = mean)) |>
collect()
# Establish a set of Parquet files
dbExecute(con, "COPY flights TO 'dataset' (FORMAT PARQUET, PARTITION_BY (year, month))")
# Summarize the dataset in DuckDB to avoid reading 12 Parquet files into R's memory
tbl(con, "read_parquet('dataset/**/*.parquet', hive_partitioning = true)") |>
filter(month == "3") |>
summarise(delay = mean(dep_time, na.rm = TRUE)) |>
collect()
You can use the [memory_limit
configuration option]({% link docs/configuration/pragmas.md %}) to limit the memory use of DuckDB, e.g.:
SET memory_limit = '2GB';
Note that this limit is only applied to the memory DuckDB uses and it does not affect the memory use of other R libraries.
Therefore, the total memory used by the R process may be higher than the configured memory_limit
.
On macOS, installing DuckDB may result in a warning unable to load shared object '.../R_X11.so'
:
Warning message:
In doTryCatch(return(expr), name, parentenv, handler) :
unable to load shared object '/Library/Frameworks/R.framework/Resources/modules//R_X11.so':
dlopen(/Library/Frameworks/R.framework/Resources/modules//R_X11.so, 0x0006): Library not loaded: /opt/X11/lib/libSM.6.dylib
Referenced from: <31EADEB5-0A17-3546-9944-9B3747071FE8> /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/modules/R_X11.so
Reason: tried: '/opt/X11/lib/libSM.6.dylib' (no such file) ...
> ')
Note that this is just a warning, so the simplest solution is to ignore it. Alternatively, you can install DuckDB from the R-universe:
install.packages("duckdb", repos = c("https://duckdb.r-universe.dev", "https://cloud.r-project.org"))
You may also install the optional xquartz
dependency via Homebrew.
layout: docu title: Java JDBC Client github_repository: https://github.com/duckdb/duckdb-java redirect_from:
- /docs/api/java
- /docs/api/java/
- /docs/api/scala
- /docs/api/scala/
The DuckDB Java JDBC API can be installed from Maven Central. Please see the [installation page]({% link docs/installation/index.html %}?environment=java) for details.
DuckDB's JDBC API implements the main parts of the standard Java Database Connectivity (JDBC) API, version 4.1. Describing JDBC is beyond the scope of this page, see the official documentation for details. Below we focus on the DuckDB-specific parts.
Refer to the externally hosted API Reference for more information about our extensions to the JDBC specification, or the below Arrow Methods.
In JDBC, database connections are created through the standard java.sql.DriverManager
class.
The driver should auto-register in the DriverManager
, if that does not work for some reason, you can enforce registration using the following statement:
Class.forName("org.duckdb.DuckDBDriver");
To create a DuckDB connection, call DriverManager
with the jdbc:duckdb:
JDBC URL prefix, like so:
import java.sql.Connection;
import java.sql.DriverManager;
Connection conn = DriverManager.getConnection("jdbc:duckdb:");
To use DuckDB-specific features such as the Appender, cast the object to a DuckDBConnection
:
import java.sql.DriverManager;
import org.duckdb.DuckDBConnection;
DuckDBConnection conn = (DuckDBConnection) DriverManager.getConnection("jdbc:duckdb:");
When using the jdbc:duckdb:
URL alone, an in-memory database is created. Note that for an in-memory database no data is persisted to disk (i.e., all data is lost when you exit the Java program). If you would like to access or create a persistent database, append its file name after the path. For example, if your database is stored in /tmp/my_database
, use the JDBC URL jdbc:duckdb:/tmp/my_database
to create a connection to it.
It is possible to open a DuckDB database file in read-only mode. This is for example useful if multiple Java processes want to read the same database file at the same time. To open an existing database file in read-only mode, set the connection property duckdb.read_only
like so:
Properties readOnlyProperty = new Properties();
readOnlyProperty.setProperty("duckdb.read_only", "true");
Connection conn = DriverManager.getConnection("jdbc:duckdb:/tmp/my_database", readOnlyProperty);
Additional connections can be created using the DriverManager
. A more efficient mechanism is to call the DuckDBConnection#duplicate()
method:
Connection conn2 = ((DuckDBConnection) conn).duplicate();
Multiple connections are allowed, but mixing read-write and read-only connections is unsupported.
Configuration options can be provided to change different settings of the database system. Note that many of these
settings can be changed later on using [PRAGMA
statements]({% link docs/configuration/pragmas.md %}) as well.
Properties connectionProperties = new Properties();
connectionProperties.setProperty("temp_directory", "/path/to/temp/dir/");
Connection conn = DriverManager.getConnection("jdbc:duckdb:/tmp/my_database", connectionProperties);
DuckDB supports the standard JDBC methods to send queries and retrieve result sets. First a Statement
object has to be created from the Connection
, this object can then be used to send queries using execute
and executeQuery
. execute()
is meant for queries where no results are expected like CREATE TABLE
or UPDATE
etc. and executeQuery()
is meant to be used for queries that produce results (e.g., SELECT
). Below two examples. See also the JDBC Statement
and ResultSet
documentations.
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
Connection conn = DriverManager.getConnection("jdbc:duckdb:");
// create a table
Statement stmt = conn.createStatement();
stmt.execute("CREATE TABLE items (item VARCHAR, value DECIMAL(10, 2), count INTEGER)");
// insert two items into the table
stmt.execute("INSERT INTO items VALUES ('jeans', 20.0, 1), ('hammer', 42.2, 2)");
try (ResultSet rs = stmt.executeQuery("SELECT * FROM items")) {
while (rs.next()) {
System.out.println(rs.getString(1));
System.out.println(rs.getInt(3));
}
}
stmt.close();
jeans
1
hammer
2
DuckDB also supports prepared statements as per the JDBC API:
import java.sql.PreparedStatement;
try (PreparedStatement stmt = conn.prepareStatement("INSERT INTO items VALUES (?, ?, ?);")) {
stmt.setString(1, "chainsaw");
stmt.setDouble(2, 500.0);
stmt.setInt(3, 42);
stmt.execute();
// more calls to execute() possible
}
Warning Do not use prepared statements to insert large amounts of data into DuckDB. See the [data import documentation]({% link docs/data/overview.md %}) for better options.
Refer to the API Reference for type signatures
The following demonstrates exporting an arrow stream and consuming it using the java arrow bindings
import org.apache.arrow.memory.RootAllocator;
import org.apache.arrow.vector.ipc.ArrowReader;
import org.duckdb.DuckDBResultSet;
try (var conn = DriverManager.getConnection("jdbc:duckdb:");
var stmt = conn.prepareStatement("SELECT * FROM generate_series(2000)");
var resultset = (DuckDBResultSet) stmt.executeQuery();
var allocator = new RootAllocator()) {
try (var reader = (ArrowReader) resultset.arrowExportStream(allocator, 256)) {
while (reader.loadNextBatch()) {
System.out.println(reader.getVectorSchemaRoot().getVector("generate_series"));
}
}
stmt.close();
}
The following demonstrates consuming an Arrow stream from the Java Arrow bindings.
import org.apache.arrow.memory.RootAllocator;
import org.apache.arrow.vector.ipc.ArrowReader;
import org.duckdb.DuckDBConnection;
// Arrow binding
try (var allocator = new RootAllocator();
ArrowStreamReader reader = null; // should not be null of course
var arrow_array_stream = ArrowArrayStream.allocateNew(allocator)) {
Data.exportArrayStream(allocator, reader, arrow_array_stream);
// DuckDB setup
try (var conn = (DuckDBConnection) DriverManager.getConnection("jdbc:duckdb:")) {
conn.registerArrowStream("asdf", arrow_array_stream);
// run a query
try (var stmt = conn.createStatement();
var rs = (DuckDBResultSet) stmt.executeQuery("SELECT count(*) FROM asdf")) {
while (rs.next()) {
System.out.println(rs.getInt(1));
}
}
}
}
Result streaming is opt-in in the JDBC driver – by setting the jdbc_stream_results
config to true
before running a query. The easiest way do that is to pass it in the Properties
object.
Properties props = new Properties();
props.setProperty(DuckDBDriver.JDBC_STREAM_RESULTS, String.valueOf(true));
Connection conn = DriverManager.getConnection("jdbc:duckdb:", props);
The [Appender]({% link docs/data/appender.md %}) is available in the DuckDB JDBC driver via the org.duckdb.DuckDBAppender
class.
The constructor of the class requires the schema name and the table name it is applied to.
The Appender is flushed when the close()
method is called.
Example:
import java.sql.DriverManager;
import java.sql.Statement;
import org.duckdb.DuckDBConnection;
DuckDBConnection conn = (DuckDBConnection) DriverManager.getConnection("jdbc:duckdb:");
try (var stmt = conn.createStatement()) {
stmt.execute("CREATE TABLE tbl (x BIGINT, y FLOAT, s VARCHAR)"
);
// using try-with-resources to automatically close the appender at the end of the scope
try (var appender = conn.createAppender(DuckDBConnection.DEFAULT_SCHEMA, "tbl")) {
appender.beginRow();
appender.append(10);
appender.append(3.2);
appender.append("hello");
appender.endRow();
appender.beginRow();
appender.append(20);
appender.append(-8.1);
appender.append("world");
appender.endRow();
}
The DuckDB JDBC driver offers batch write functionality. The batch writer supports prepared statements to mitigate the overhead of query parsing.
The preferred method for bulk inserts is to use the Appender due to its higher performance. However, when using the Appender is not possbile, the batch writer is available as alternative.
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import org.duckdb.DuckDBConnection;
DuckDBConnection conn = (DuckDBConnection) DriverManager.getConnection("jdbc:duckdb:");
PreparedStatement stmt = conn.prepareStatement("INSERT INTO test (x, y, z) VALUES (?, ?, ?);");
stmt.setObject(1, 1);
stmt.setObject(2, 2);
stmt.setObject(3, 3);
stmt.addBatch();
stmt.setObject(1, 4);
stmt.setObject(2, 5);
stmt.setObject(3, 6);
stmt.addBatch();
stmt.executeBatch();
stmt.close();
The batch writer also supports vanilla SQL statements:
import java.sql.DriverManager;
import java.sql.Statement;
import org.duckdb.DuckDBConnection;
DuckDBConnection conn = (DuckDBConnection) DriverManager.getConnection("jdbc:duckdb:");
Statement stmt = conn.createStatement();
stmt.execute("CREATE TABLE test (x INTEGER, y INTEGER, z INTEGER)");
stmt.addBatch("INSERT INTO test (x, y, z) VALUES (1, 2, 3);");
stmt.addBatch("INSERT INTO test (x, y, z) VALUES (4, 5, 6);");
stmt.executeBatch();
stmt.close();
If the Java application is unable to find the DuckDB, it may throw the following error:
Exception in thread "main" java.sql.SQLException: No suitable driver found for jdbc:duckdb:
at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:706)
at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:252)
...
And when trying to load the class manually, it may result in this error:
Exception in thread "main" java.lang.ClassNotFoundException: org.duckdb.DuckDBDriver
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520)
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Class.java:375)
...
These errors stem from the DuckDB Maven/Gradle dependency not being detected. To ensure that it is detected, force refresh the Maven configuration in your IDE.
layout: docu title: ODBC API Overview github_repository: https://github.com/duckdb/duckdb-odbc redirect_from:
- /docs/api/odbc
- /docs/api/odbc/
- /docs/api/odbc/overview
- /docs/api/odbc/overview/
The ODBC (Open Database Connectivity) is a C-style API that provides access to different flavors of Database Management Systems (DBMSs). The ODBC API consists of the Driver Manager (DM) and the ODBC drivers.
The Driver Manager is part of the system library, e.g., unixODBC, which manages the communications between the user applications and the ODBC drivers. Typically, applications are linked against the DM, which uses Data Source Name (DSN) to look up the correct ODBC driver.
The ODBC driver is a DBMS implementation of the ODBC API, which handles all the internals of that DBMS.
The DM maps user application calls of ODBC functions to the correct ODBC driver that performs the specified function and returns the proper values.
DuckDB supports the ODBC version 3.0 according to the Core Interface Conformance.
The ODBC driver is available for all operating systems. Visit the [installation page]({% link docs/installation/index.html %}) for direct links.
layout: docu title: ODBC API on Linux github_repository: https://github.com/duckdb/duckdb-odbc redirect_from:
- /docs/api/odbc/linux
- /docs/api/odbc/linux/
A driver manager is required to manage communication between applications and the ODBC driver.
We tested and support unixODBC
that is a complete ODBC driver manager for Linux.
Users can install it from the command line:
On Debian-based distributions (Ubuntu, Mint, etc.), run:
sudo apt-get install unixodbc odbcinst
On Fedora-based distributions (Amazon Linux, RHEL, CentOS, etc.), run:
sudo yum install unixODBC
-
Download the ODBC Linux Asset corresponding to your architecture:
- [x86_64 (AMD64)](https://github.com/duckdb/duckdb/releases/download/v{{ site.currentduckdbodbcversion }}/duckdb_odbc-linux-amd64.zip)
- [arm64](https://github.com/duckdb/duckdb/releases/download/v{{ site.currentduckdbodbcversion }}/duckdb_odbc-linux-aarch64.zip)
-
The package contains the following files:
libduckdb_odbc.so
: the DuckDB driver.unixodbc_setup.sh
: a setup script to aid the configuration on Linux.
To extract them, run:
mkdir duckdb_odbc && unzip duckdb_odbc-linux-amd64.zip -d duckdb_odbc
-
The
unixodbc_setup.sh
script performs the configuration of the DuckDB ODBC Driver. It is based on the unixODBC package that provides some commands to handle the ODBC setup and test likeodbcinst
andisql
.Run the following commands with either option
-u
or-s
to configure DuckDB ODBC.The
-u
option based on the user home directory to setup the ODBC init files../unixodbc_setup.sh -u
The
-s
option changes the system level files that will be visible for all users, because of that it requires root privileges.sudo ./unixodbc_setup.sh -s
The option
--help
shows the usage ofunixodbc_setup.sh
prints the help../unixodbc_setup.sh --help
Usage: ./unixodbc_setup.sh <level> [options] Example: ./unixodbc_setup.sh -u -db ~/database_path -D ~/driver_path/libduckdb_odbc.so Level: -s: System-level, using 'sudo' to configure DuckDB ODBC at the system-level, changing the files: /etc/odbc[inst].ini -u: User-level, configuring the DuckDB ODBC at the user-level, changing the files: ~/.odbc[inst].ini. Options: -db database_path>: the DuckDB database file path, the default is ':memory:' if not provided. -D driver_path: the driver file path (i.e., the path for libduckdb_odbc.so), the default is using the base script directory
-
The ODBC setup on Linux is based on the
.odbc.ini
and.odbcinst.ini
files.These files can be placed to the user home directory
/home/⟨username⟩
or in the system/etc
directory. The Driver Manager prioritizes the user configuration files over the system files.For the details of the configuration parameters, see the [ODBC configuration page]({% link docs/clients/odbc/configuration.md %}).
layout: docu title: ODBC API on macOS github_repository: https://github.com/duckdb/duckdb-odbc redirect_from:
- /docs/api/odbc/macos
- /docs/api/odbc/macos/
-
A driver manager is required to manage communication between applications and the ODBC driver. DuckDB supports
unixODBC
, which is a complete ODBC driver manager for macOS and Linux. Users can install it from the command line via Homebrew:brew install unixodbc
-
DuckDB releases a universal [ODBC driver for macOS](https://github.com/duckdb/duckdb/releases/download/v{{ site.currentduckdbodbcversion }}/duckdb_odbc-osx-universal.zip) (supporting both Intel and Apple Silicon CPUs). To download it, run:
wget https://github.com/duckdb/duckdb/releases/download/v{{ site.currentduckdbodbcversion }}/duckdb_odbc-osx-universal.zip
-
The archive contains the
libduckdb_odbc.dylib
artifact. To extract it to a directory, run:mkdir duckdb_odbc && unzip duckdb_odbc-osx-universal.zip -d duckdb_odbc
-
There are two ways to configure the ODBC driver, either by initializing via the configuration files, or by connecting with
SQLDriverConnect
. A combination of the two is also possible.Furthermore, the ODBC driver supports all the [configuration options]({% link docs/configuration/overview.md %}) included in DuckDB.
If a configuration is set in both the connection string passed to
SQLDriverConnect
and in theodbc.ini
file, the one passed toSQLDriverConnect
will take precedence.For the details of the configuration parameters, see the [ODBC configuration page]({% link docs/clients/odbc/configuration.md %}).
-
After the configuration, to validate the installation, it is possible to use an ODBC client. unixODBC uses a command line tool called
isql
.Use the DSN defined in
odbc.ini
as a parameter ofisql
.isql DuckDB
+---------------------------------------+ | Connected! | | | | sql-statement | | help [tablename] | | echo [string] | | quit | | | +---------------------------------------+
SQL> SELECT 42;
+------------+ | 42 | +------------+ | 42 | +------------+ SQLRowCount returns -1 1 rows fetched
layout: docu title: ODBC API on Windows github_repository: https://github.com/duckdb/duckdb-odbc redirect_from:
- /docs/api/odbc/windows
- /docs/api/odbc/windows/
Using the DuckDB ODBC API on Windows requires the following steps:
-
The Microsoft Windows requires an ODBC Driver Manager to manage communication between applications and the ODBC drivers. The Driver Manager on Windows is provided in a DLL file
odbccp32.dll
, and other files and tools. For detailed information check out the Common ODBC Component Files. - DuckDB releases the ODBC driver as an asset. For Windows, download it from the [Windows ODBC asset (x86_64/AMD64)](https://github.com/duckdb/duckdb/releases/download/v{{ site.currentduckdbodbcversion }}/duckdb_odbc-windows-amd64.zip).
-
The archive contains the following artifacts:
duckdb_odbc.dll
: the DuckDB driver compiled for Windows.duckdb_odbc_setup.dll
: a setup DLL used by the Windows ODBC Data Source Administrator tool.odbc_install.exe
: an installation script to aid the configuration on Windows.
Decompress the archive to a directory (e.g.,
duckdb_odbc
). For example, run:mkdir duckdb_odbc && unzip duckdb_odbc-windows-amd64.zip -d duckdb_odbc
-
The
odbc_install.exe
binary performs the configuration of the DuckDB ODBC Driver on Windows. It depends on theOdbccp32.dll
that provides functions to configure the ODBC registry entries.Inside the permanent directory (e.g.,
duckdb_odbc
), double-click on theodbc_install.exe
.Windows administrator privileges are required. In case of a non-administrator, a User Account Control prompt will occur.
-
odbc_install.exe
adds a default DSN configuration into the ODBC registries with a default database:memory:
.
After the installation, it is possible to change the default DSN configuration or add a new one using the Windows ODBC Data Source Administrator tool odbcad32.exe
.
It also can be launched thought the Windows start:
The newly installed DSN is visible on the System DSN in the Windows ODBC Data Source Administrator tool:
When selecting the default DSN (i.e., DuckDB
) or adding a new configuration, the following setup window will display:
This window allows you to set the DSN and the database file path associated with that DSN.
There are two ways to configure the ODBC driver, either by altering the registry keys as detailed below,
or by connecting with SQLDriverConnect
.
A combination of the two is also possible.
Furthermore, the ODBC driver supports all the [configuration options]({% link docs/configuration/overview.md %}) included in DuckDB.
If a configuration is set in both the connection string passed to
SQLDriverConnect
and in theodbc.ini
file, the one passed toSQLDriverConnect
will take precedence.
For the details of the configuration parameters, see the [ODBC configuration page]({% link docs/clients/odbc/configuration.md %}).
The ODBC setup on Windows is based on registry keys (see Registry Entries for ODBC Components).
The ODBC entries can be placed at the current user registry key (HKCU
) or the system registry key (HKLM
).
We have tested and used the system entries based on HKLM->SOFTWARE->ODBC
.
The odbc_install.exe
changes this entry that has two subkeys: ODBC.INI
and ODBCINST.INI
.
The ODBC.INI
is where users usually insert DSN registry entries for the drivers.
For example, the DSN registry for DuckDB would look like this:
The ODBCINST.INI
contains one entry for each ODBC driver and other keys predefined for Windows ODBC configuration.
When a new version of the ODBC driver is released, installing the new version will overwrite the existing one.
However, the installer doesn't always update the version number in the registry.
To ensure the correct version is used,
check that HKEY_LOCAL_MACHINE\SOFTWARE\ODBC\ODBCINST.INI\DuckDB Driver
has the most recent version,
and HKEY_LOCAL_MACHINE\SOFTWARE\ODBC\ODBC.INI\DuckDB\Driver
has the correct path to the new driver.
layout: docu title: ODBC Configuration github_repository: https://github.com/duckdb/duckdb-odbc redirect_from:
- /docs/api/odbc/configuration
- /docs/api/odbc/configuration/
This page documents the files using the ODBC configuration, odbc.ini
and odbcinst.ini
.
These are either placed in the home directory as dotfiles (.odbc.ini
and .odbcinst.ini
, respectively) or in a system directory.
For platform-specific details, see the pages for [Linux]({% link docs/clients/odbc/linux.md %}), [macOS]({% link docs/clients/odbc/macos.md %}), and [Windows]({% link docs/clients/odbc/windows.md %}).
The odbc.ini
file contains the DSNs for the drivers, which can have specific knobs.
An example of odbc.ini
with DuckDB:
[DuckDB]
Driver = DuckDB Driver
Database = :memory:
access_mode = read_only
The lines correspond to the following parameters:
[DuckDB]
: between the brackets is a DSN for the DuckDB.Driver
: Describes the driver's name, as well as where to find the configurations in theodbcinst.ini
.Database
: Describes the database name used by DuckDB, can also be a file path to a.db
in the system.access_mode
: The mode in which to connect to the database.
The odbcinst.ini
file contains general configurations for the ODBC installed drivers in the system.
A driver section starts with the driver name between brackets, and then it follows specific configuration knobs belonging to that driver.
Example of odbcinst.ini
with the DuckDB:
[ODBC]
Trace = yes
TraceFile = /tmp/odbctrace
[DuckDB Driver]
Driver = /path/to/libduckdb_odbc.dylib
The lines correspond to the following parameters:
[ODBC]
: The DM configuration section.Trace
: Enables the ODBC trace file using the optionyes
.TraceFile
: The absolute system file path for the ODBC trace file.[DuckDB Driver]
: The section of the DuckDB installed driver.Driver
: The absolute system file path of the DuckDB driver. Change to match your configuration.
layout: docu title: Dart Client github_repository: https://github.com/TigerEyeLabs/duckdb-dart redirect_from:
- /docs/api/dart
- /docs/api/dart/
DuckDB.Dart is the native Dart API for DuckDB.
DuckDB.Dart can be installed from pub.dev. Please see the API Reference for details.
Add the dependency with Flutter:
flutter pub add dart_duckdb
This will add a line like this to your package's pubspec.yaml
(and run an implicit flutter pub get
):
dependencies:
dart_duckdb: ^1.1.3
Alternatively, your editor might support flutter pub get
. Check the docs for your editor to learn more.
Now in your Dart code, you can import it:
import 'package:dart_duckdb/dart_duckdb.dart';
See the example projects in the duckdb-dart
repository:
cli
: command-line applicationduckdbexplorer
: GUI application which builds for desktop operating systems as well as Android and iOS.
Here are some common code snippets for DuckDB.Dart:
import 'package:dart_duckdb/dart_duckdb.dart';
void main() {
final db = duckdb.open(":memory:");
final connection = duckdb.connect(db);
connection.execute('''
CREATE TABLE users (id INTEGER, name VARCHAR, age INTEGER);
INSERT INTO users VALUES (1, 'Alice', 30), (2, 'Bob', 25);
''');
final result = connection.query("SELECT * FROM users WHERE age > 28").fetchAll();
for (final row in result) {
print(row);
}
connection.dispose();
db.dispose();
}
import 'package:dart_duckdb/dart_duckdb.dart';
void main() {
final db = duckdb.open(":memory:");
final connection = duckdb.connect(db);
await Isolate.spawn(backgroundTask, db.transferrable);
connection.dispose();
db.dispose();
}
void backgroundTask(TransferableDatabase transferableDb) {
final connection = duckdb.connectWithTransferred(transferableDb);
// Access database ...
// fetch is needed to send the data back to the main isolate
}
layout: docu title: C++ API redirect_from:
- /docs/api/cpp
- /docs/api/cpp/
Warning DuckDB's C++ API is internal. It is not guaranteed to be stable and can change without notice. If you would like to build an application on DuckDB, we recommend using the [C API]({% link docs/clients/c/overview.md %}).
The DuckDB C++ API can be installed as part of the libduckdb
packages. Please see the [installation page]({% link docs/installation/index.html %}?environment=cplusplus) for details.
DuckDB implements a custom C++ API. This is built around the abstractions of a database instance (DuckDB
class), multiple Connection
s to the database instance and QueryResult
instances as the result of queries. The header file for the C++ API is duckdb.hpp
.
To use DuckDB, you must first initialize a DuckDB
instance using its constructor. DuckDB()
takes as parameter the database file to read and write from. The special value nullptr
can be used to create an in-memory database. Note that for an in-memory database no data is persisted to disk (i.e., all data is lost when you exit the process). The second parameter to the DuckDB
constructor is an optional DBConfig
object. In DBConfig
, you can set various database parameters, for example the read/write mode or memory limits. The DuckDB
constructor may throw exceptions, for example if the database file is not usable.
With the DuckDB
instance, you can create one or many Connection
instances using the Connection()
constructor. While connections should be thread-safe, they will be locked during querying. It is therefore recommended that each thread uses its own connection if you are in a multithreaded environment.
DuckDB db(nullptr);
Connection con(db);
Connections expose the Query()
method to send a SQL query string to DuckDB from C++. Query()
fully materializes the query result as a MaterializedQueryResult
in memory before returning at which point the query result can be consumed. There is also a streaming API for queries, see further below.
// create a table
con.Query("CREATE TABLE integers (i INTEGER, j INTEGER)");
// insert three rows into the table
con.Query("INSERT INTO integers VALUES (3, 4), (5, 6), (7, NULL)");
auto result = con.Query("SELECT * FROM integers");
if (result->HasError()) {
cerr << result->GetError() << endl;
} else {
cout << result->ToString() << endl;
}
The MaterializedQueryResult
instance contains firstly two fields that indicate whether the query was successful. Query
will not throw exceptions under normal circumstances. Instead, invalid queries or other issues will lead to the success
Boolean field in the query result instance to be set to false
. In this case an error message may be available in error
as a string. If successful, other fields are set: the type of statement that was just executed (e.g., StatementType::INSERT_STATEMENT
) is contained in statement_type
. The high-level (“Logical type”/“SQL type”) types of the result set columns are in types
. The names of the result columns are in the names
string vector. In case multiple result sets are returned, for example because the result set contained multiple statements, the result set can be chained using the next
field.
DuckDB also supports prepared statements in the C++ API with the Prepare()
method. This returns an instance of PreparedStatement
. This instance can be used to execute the prepared statement with parameters. Below is an example:
std::unique_ptr<PreparedStatement> prepare = con.Prepare("SELECT count(*) FROM a WHERE i = $1");
std::unique_ptr<QueryResult> result = prepare->Execute(12);
Warning Do not use prepared statements to insert large amounts of data into DuckDB. See the [data import documentation]({% link docs/data/overview.md %}) for better options.
The UDF API allows the definition of user-defined functions. It is exposed in duckdb:Connection
through the methods: CreateScalarFunction()
, CreateVectorizedFunction()
, and variants.
These methods created UDFs into the temporary schema (TEMP_SCHEMA
) of the owner connection that is the only one allowed to use and change them.
The user can code an ordinary scalar function and invoke the CreateScalarFunction()
to register and afterward use the UDF in a SELECT
statement, for instance:
bool bigger_than_four(int value) {
return value > 4;
}
connection.CreateScalarFunction<bool, int>("bigger_than_four", &bigger_than_four);
connection.Query("SELECT bigger_than_four(i) FROM (VALUES(3), (5)) tbl(i)")->Print();
The CreateScalarFunction()
methods automatically creates vectorized scalar UDFs so they are as efficient as built-in functions, we have two variants of this method interface as follows:
1.
template<typename TR, typename... Args>
void CreateScalarFunction(string name, TR (*udf_func)(Args…))
- template parameters:
- TR is the return type of the UDF function;
- Args are the arguments up to 3 for the UDF function (this method only supports until ternary functions);
- name: is the name to register the UDF function;
- udf_func: is a pointer to the UDF function.
This method automatically discovers from the template typenames the corresponding LogicalTypes:
bool
→LogicalType::BOOLEAN
int8_t
→LogicalType::TINYINT
int16_t
→LogicalType::SMALLINT
int32_t
→LogicalType::INTEGER
int64_t
→LogicalType::BIGINT
float
→LogicalType::FLOAT
double
→LogicalType::DOUBLE
string_t
→LogicalType::VARCHAR
In DuckDB some primitive types, e.g., int32_t
, are mapped to the same LogicalType
: INTEGER
, TIME
and DATE
, then for disambiguation the users can use the following overloaded method.
2.
template<typename TR, typename... Args>
void CreateScalarFunction(string name, vector<LogicalType> args, LogicalType ret_type, TR (*udf_func)(Args…))
An example of use would be:
int32_t udf_date(int32_t a) {
return a;
}
con.Query("CREATE TABLE dates (d DATE)");
con.Query("INSERT INTO dates VALUES ('1992-01-01')");
con.CreateScalarFunction<int32_t, int32_t>("udf_date", {LogicalType::DATE}, LogicalType::DATE, &udf_date);
con.Query("SELECT udf_date(d) FROM dates")->Print();
- template parameters:
- TR is the return type of the UDF function;
- Args are the arguments up to 3 for the UDF function (this method only supports until ternary functions);
- name: is the name to register the UDF function;
- args: are the LogicalType arguments that the function uses, which should match with the template Args types;
- ret_type: is the LogicalType of return of the function, which should match with the template TR type;
- udf_func: is a pointer to the UDF function.
This function checks the template types against the LogicalTypes passed as arguments and they must match as follow:
- LogicalTypeId::BOOLEAN → bool
- LogicalTypeId::TINYINT → int8_t
- LogicalTypeId::SMALLINT → int16_t
- LogicalTypeId::DATE, LogicalTypeId::TIME, LogicalTypeId::INTEGER → int32_t
- LogicalTypeId::BIGINT, LogicalTypeId::TIMESTAMP → int64_t
- LogicalTypeId::FLOAT, LogicalTypeId::DOUBLE, LogicalTypeId::DECIMAL → double
- LogicalTypeId::VARCHAR, LogicalTypeId::CHAR, LogicalTypeId::BLOB → string_t
- LogicalTypeId::VARBINARY → blob_t
The CreateVectorizedFunction()
methods register a vectorized UDF such as:
/*
* This vectorized function copies the input values to the result vector
*/
template<typename TYPE>
static void udf_vectorized(DataChunk &args, ExpressionState &state, Vector &result) {
// set the result vector type
result.vector_type = VectorType::FLAT_VECTOR;
// get a raw array from the result
auto result_data = FlatVector::GetData<TYPE>(result);
// get the solely input vector
auto &input = args.data[0];
// now get an orrified vector
VectorData vdata;
input.Orrify(args.size(), vdata);
// get a raw array from the orrified input
auto input_data = (TYPE *)vdata.data;
// handling the data
for (idx_t i = 0; i < args.size(); i++) {
auto idx = vdata.sel->get_index(i);
if ((*vdata.nullmask)[idx]) {
continue;
}
result_data[i] = input_data[idx];
}
}
con.Query("CREATE TABLE integers (i INTEGER)");
con.Query("INSERT INTO integers VALUES (1), (2), (3), (999)");
con.CreateVectorizedFunction<int, int>("udf_vectorized_int", &&udf_vectorized<int>);
con.Query("SELECT udf_vectorized_int(i) FROM integers")->Print();
The Vectorized UDF is a pointer of the type scalar_function_t:
typedef std::function<void(DataChunk &args, ExpressionState &expr, Vector &result)> scalar_function_t;
- args is a DataChunk that holds a set of input vectors for the UDF that all have the same length;
- expr is an ExpressionState that provides information to the query's expression state;
- result: is a Vector to store the result values.
There are different vector types to handle in a Vectorized UDF:
- ConstantVector;
- DictionaryVector;
- FlatVector;
- ListVector;
- StringVector;
- StructVector;
- SequenceVector.
The general API of the CreateVectorizedFunction()
method is as follows:
1.
template<typename TR, typename... Args>
void CreateVectorizedFunction(string name, scalar_function_t udf_func, LogicalType varargs = LogicalType::INVALID)
- template parameters:
- TR is the return type of the UDF function;
- Args are the arguments up to 3 for the UDF function.
- name is the name to register the UDF function;
- udf_func is a vectorized UDF function;
- varargs The type of varargs to support, or LogicalTypeId::INVALID (default value) if the function does not accept variable length arguments.
This method automatically discovers from the template typenames the corresponding LogicalTypes:
- bool → LogicalType::BOOLEAN;
- int8_t → LogicalType::TINYINT;
- int16_t → LogicalType::SMALLINT
- int32_t → LogicalType::INTEGER
- int64_t → LogicalType::BIGINT
- float → LogicalType::FLOAT
- double → LogicalType::DOUBLE
- string_t → LogicalType::VARCHAR
2.
template<typename TR, typename... Args>
void CreateVectorizedFunction(string name, vector<LogicalType> args, LogicalType ret_type, scalar_function_t udf_func, LogicalType varargs = LogicalType::INVALID)
layout: docu title: Overview redirect_from:
- /docs/api/c
- /docs/api/c/
- /docs/api/c/overview
- /docs/api/c/overview/
DuckDB implements a custom C API modelled somewhat following the SQLite C API. The API is contained in the duckdb.h
header. Continue to [Startup & Shutdown]({% link docs/clients/c/connect.md %}) to get started, or check out the [Full API overview]({% link docs/clients/c/api.md %}).
We also provide a SQLite API wrapper which means that if your applications is programmed against the SQLite C API, you can re-link to DuckDB and it should continue working. See the sqlite_api_wrapper
folder in our source repository for more information.
The DuckDB C API can be installed as part of the libduckdb
packages. Please see the installation page for details.
layout: docu title: Configuration redirect_from:
- /docs/api/c/config
- /docs/api/c/config/
Configuration options can be provided to change different settings of the database system. Note that many of these
settings can be changed later on using PRAGMA
statements as well. The configuration object
should be created, filled with values and passed to duckdb_open_ext
.
duckdb_database db;
duckdb_config config;
// create the configuration object
if (duckdb_create_config(&config) == DuckDBError) {
// handle error
}
// set some configuration options
duckdb_set_config(config, "access_mode", "READ_WRITE"); // or READ_ONLY
duckdb_set_config(config, "threads", "8");
duckdb_set_config(config, "max_memory", "8GB");
duckdb_set_config(config, "default_order", "DESC");
// open the database using the configuration
if (duckdb_open_ext(NULL, &db, config, NULL) == DuckDBError) {
// handle error
}
// cleanup the configuration object
duckdb_destroy_config(&config);
// run queries...
// cleanup
duckdb_close(&db);
duckdb_state duckdb_create_config(duckdb_config *out_config);
size_t duckdb_config_count();
duckdb_state duckdb_get_config_flag(size_t index, const char **out_name, const char **out_description);
duckdb_state duckdb_set_config(duckdb_config config, const char *name, const char *option);
void duckdb_destroy_config(duckdb_config *config);
Initializes an empty configuration object that can be used to provide start-up options for the DuckDB instance
through duckdb_open_ext
.
The duckdb_config must be destroyed using 'duckdb_destroy_config'
This will always succeed unless there is a malloc failure.
Note that duckdb_destroy_config
should always be called on the resulting config, even if the function returns
DuckDBError
.
duckdb_state duckdb_create_config(
duckdb_config *out_config
);
out_config
: The result configuration object.
DuckDBSuccess
on success or DuckDBError
on failure.
This returns the total amount of configuration options available for usage with duckdb_get_config_flag
.
This should not be called in a loop as it internally loops over all the options.
The amount of config options available.
size_t duckdb_config_count(
);
Obtains a human-readable name and description of a specific configuration option. This can be used to e.g.
display configuration options. This will succeed unless index
is out of range (i.e., >= duckdb_config_count
).
The result name or description MUST NOT be freed.
duckdb_state duckdb_get_config_flag(
size_t index,
const char **out_name,
const char **out_description
);
index
: The index of the configuration option (between 0 andduckdb_config_count
)out_name
: A name of the configuration flag.out_description
: A description of the configuration flag.
DuckDBSuccess
on success or DuckDBError
on failure.
Sets the specified option for the specified configuration. The configuration option is indicated by name.
To obtain a list of config options, see duckdb_get_config_flag
.
In the source code, configuration options are defined in config.cpp
.
This can fail if either the name is invalid, or if the value provided for the option is invalid.
duckdb_state duckdb_set_config(
duckdb_config config,
const char *name,
const char *option
);
config
: The configuration object to set the option on.name
: The name of the configuration flag to set.option
: The value to set the configuration flag to.
DuckDBSuccess
on success or DuckDBError
on failure.
Destroys the specified configuration object and de-allocates all memory allocated for the object.
void duckdb_destroy_config(
duckdb_config *config
);
config
: The configuration object to destroy.
--- layout: docu title: Table Functions redirect_from: - /docs/api/c/table_functions - /docs/api/c/table_functions/ ---
The table function API can be used to define a table function that can then be called from within DuckDB in the FROM
clause of a query.
duckdb_table_function duckdb_create_table_function();
void duckdb_destroy_table_function(duckdb_table_function *table_function);
void duckdb_table_function_set_name(duckdb_table_function table_function, const char *name);
void duckdb_table_function_add_parameter(duckdb_table_function table_function, duckdb_logical_type type);
void duckdb_table_function_add_named_parameter(duckdb_table_function table_function, const char *name, duckdb_logical_type type);
void duckdb_table_function_set_extra_info(duckdb_table_function table_function, void *extra_info, duckdb_delete_callback_t destroy);
void duckdb_table_function_set_bind(duckdb_table_function table_function, duckdb_table_function_bind_t bind);
void duckdb_table_function_set_init(duckdb_table_function table_function, duckdb_table_function_init_t init);
void duckdb_table_function_set_local_init(duckdb_table_function table_function, duckdb_table_function_init_t init);
void duckdb_table_function_set_function(duckdb_table_function table_function, duckdb_table_function_t function);
void duckdb_table_function_supports_projection_pushdown(duckdb_table_function table_function, bool pushdown);
duckdb_state duckdb_register_table_function(duckdb_connection con, duckdb_table_function function);
void *duckdb_bind_get_extra_info(duckdb_bind_info info);
void duckdb_bind_add_result_column(duckdb_bind_info info, const char *name, duckdb_logical_type type);
idx_t duckdb_bind_get_parameter_count(duckdb_bind_info info);
duckdb_value duckdb_bind_get_parameter(duckdb_bind_info info, idx_t index);
duckdb_value duckdb_bind_get_named_parameter(duckdb_bind_info info, const char *name);
void duckdb_bind_set_bind_data(duckdb_bind_info info, void *bind_data, duckdb_delete_callback_t destroy);
void duckdb_bind_set_cardinality(duckdb_bind_info info, idx_t cardinality, bool is_exact);
void duckdb_bind_set_error(duckdb_bind_info info, const char *error);
void *duckdb_init_get_extra_info(duckdb_init_info info);
void *duckdb_init_get_bind_data(duckdb_init_info info);
void duckdb_init_set_init_data(duckdb_init_info info, void *init_data, duckdb_delete_callback_t destroy);
idx_t duckdb_init_get_column_count(duckdb_init_info info);
idx_t duckdb_init_get_column_index(duckdb_init_info info, idx_t column_index);
void duckdb_init_set_max_threads(duckdb_init_info info, idx_t max_threads);
void duckdb_init_set_error(duckdb_init_info info, const char *error);
void *duckdb_function_get_extra_info(duckdb_function_info info);
void *duckdb_function_get_bind_data(duckdb_function_info info);
void *duckdb_function_get_init_data(duckdb_function_info info);
void *duckdb_function_get_local_init_data(duckdb_function_info info);
void duckdb_function_set_error(duckdb_function_info info, const char *error);
Creates a new empty table function.
The return value should be destroyed with duckdb_destroy_table_function
.
The table function object.
duckdb_table_function duckdb_create_table_function(
);
Destroys the given table function object.
void duckdb_destroy_table_function(
duckdb_table_function *table_function
);
table_function
: The table function to destroy
Sets the name of the given table function.
void duckdb_table_function_set_name(
duckdb_table_function table_function,
const char *name
);
table_function
: The table functionname
: The name of the table function
Adds a parameter to the table function.
void duckdb_table_function_add_parameter(
duckdb_table_function table_function,
duckdb_logical_type type
);
table_function
: The table function.type
: The parameter type. Cannot contain INVALID.
Adds a named parameter to the table function.
void duckdb_table_function_add_named_parameter(
duckdb_table_function table_function,
const char *name,
duckdb_logical_type type
);
table_function
: The table function.name
: The parameter name.type
: The parameter type. Cannot contain INVALID.
Assigns extra information to the table function that can be fetched during binding, etc.
void duckdb_table_function_set_extra_info(
duckdb_table_function table_function,
void *extra_info,
duckdb_delete_callback_t destroy
);
table_function
: The table functionextra_info
: The extra informationdestroy
: The callback that will be called to destroy the bind data (if any)
Sets the bind function of the table function.
void duckdb_table_function_set_bind(
duckdb_table_function table_function,
duckdb_table_function_bind_t bind
);
table_function
: The table functionbind
: The bind function
Sets the init function of the table function.
void duckdb_table_function_set_init(
duckdb_table_function table_function,
duckdb_table_function_init_t init
);
table_function
: The table functioninit
: The init function
Sets the thread-local init function of the table function.
void duckdb_table_function_set_local_init(
duckdb_table_function table_function,
duckdb_table_function_init_t init
);
table_function
: The table functioninit
: The init function
Sets the main function of the table function.
void duckdb_table_function_set_function(
duckdb_table_function table_function,
duckdb_table_function_t function
);
table_function
: The table functionfunction
: The function
Sets whether or not the given table function supports projection pushdown.
If this is set to true, the system will provide a list of all required columns in the init
stage through
the duckdb_init_get_column_count
and duckdb_init_get_column_index
functions.
If this is set to false (the default), the system will expect all columns to be projected.
void duckdb_table_function_supports_projection_pushdown(
duckdb_table_function table_function,
bool pushdown
);
table_function
: The table functionpushdown
: True if the table function supports projection pushdown, false otherwise.
Register the table function object within the given connection.
The function requires at least a name, a bind function, an init function and a main function.
If the function is incomplete or a function with this name already exists DuckDBError is returned.
duckdb_state duckdb_register_table_function(
duckdb_connection con,
duckdb_table_function function
);
con
: The connection to register it in.function
: The function pointer
Whether or not the registration was successful.
Retrieves the extra info of the function as set in duckdb_table_function_set_extra_info
.
void *duckdb_bind_get_extra_info(
duckdb_bind_info info
);
info
: The info object
The extra info
Adds a result column to the output of the table function.
void duckdb_bind_add_result_column(
duckdb_bind_info info,
const char *name,
duckdb_logical_type type
);
info
: The table function's bind info.name
: The column name.type
: The logical column type.
Retrieves the number of regular (non-named) parameters to the function.
idx_t duckdb_bind_get_parameter_count(
duckdb_bind_info info
);
info
: The info object
The number of parameters
Retrieves the parameter at the given index.
The result must be destroyed with duckdb_destroy_value
.
duckdb_value duckdb_bind_get_parameter(
duckdb_bind_info info,
idx_t index
);
info
: The info objectindex
: The index of the parameter to get
The value of the parameter. Must be destroyed with duckdb_destroy_value
.
Retrieves a named parameter with the given name.
The result must be destroyed with duckdb_destroy_value
.
duckdb_value duckdb_bind_get_named_parameter(
duckdb_bind_info info,
const char *name
);
info
: The info objectname
: The name of the parameter
The value of the parameter. Must be destroyed with duckdb_destroy_value
.
Sets the user-provided bind data in the bind object. This object can be retrieved again during execution.
void duckdb_bind_set_bind_data(
duckdb_bind_info info,
void *bind_data,
duckdb_delete_callback_t destroy
);
info
: The info objectbind_data
: The bind data object.destroy
: The callback that will be called to destroy the bind data (if any)
Sets the cardinality estimate for the table function, used for optimization.
void duckdb_bind_set_cardinality(
duckdb_bind_info info,
idx_t cardinality,
bool is_exact
);
info
: The bind data object.is_exact
: Whether or not the cardinality estimate is exact, or an approximation
Report that an error has occurred while calling bind.
void duckdb_bind_set_error(
duckdb_bind_info info,
const char *error
);
info
: The info objecterror
: The error message
Retrieves the extra info of the function as set in duckdb_table_function_set_extra_info
.
void *duckdb_init_get_extra_info(
duckdb_init_info info
);
info
: The info object
The extra info
Gets the bind data set by duckdb_bind_set_bind_data
during the bind.
Note that the bind data should be considered as read-only. For tracking state, use the init data instead.
void *duckdb_init_get_bind_data(
duckdb_init_info info
);
info
: The info object
The bind data object
Sets the user-provided init data in the init object. This object can be retrieved again during execution.
void duckdb_init_set_init_data(
duckdb_init_info info,
void *init_data,
duckdb_delete_callback_t destroy
);
info
: The info objectinit_data
: The init data object.destroy
: The callback that will be called to destroy the init data (if any)
Returns the number of projected columns.
This function must be used if projection pushdown is enabled to figure out which columns to emit.
idx_t duckdb_init_get_column_count(
duckdb_init_info info
);
info
: The info object
The number of projected columns.
Returns the column index of the projected column at the specified position.
This function must be used if projection pushdown is enabled to figure out which columns to emit.
idx_t duckdb_init_get_column_index(
duckdb_init_info info,
idx_t column_index
);
info
: The info objectcolumn_index
: The index at which to get the projected column index, from 0..duckdb_init_get_column_count(info)
The column index of the projected column.
Sets how many threads can process this table function in parallel (default: 1)
void duckdb_init_set_max_threads(
duckdb_init_info info,
idx_t max_threads
);
info
: The info objectmax_threads
: The maximum amount of threads that can process this table function
Report that an error has occurred while calling init.
void duckdb_init_set_error(
duckdb_init_info info,
const char *error
);
info
: The info objecterror
: The error message
Retrieves the extra info of the function as set in duckdb_table_function_set_extra_info
.
void *duckdb_function_get_extra_info(
duckdb_function_info info
);
info
: The info object
The extra info
Gets the bind data set by duckdb_bind_set_bind_data
during the bind.
Note that the bind data should be considered as read-only. For tracking state, use the init data instead.
void *duckdb_function_get_bind_data(
duckdb_function_info info
);
info
: The info object
The bind data object
Gets the init data set by duckdb_init_set_init_data
during the init.
void *duckdb_function_get_init_data(
duckdb_function_info info
);
info
: The info object
The init data object
Gets the thread-local init data set by duckdb_init_set_init_data
during the local_init.
void *duckdb_function_get_local_init_data(
duckdb_function_info info
);
info
: The info object
The init data object
Report that an error has occurred while executing the function.
void duckdb_function_set_error(
duckdb_function_info info,
const char *error
);
info
: The info objecterror
: The error message
--- layout: docu title: Startup & Shutdown redirect_from: - /docs/api/c/connect - /docs/api/c/connect/ ---
To use DuckDB, you must first initialize a duckdb_database
handle using duckdb_open()
. duckdb_open()
takes as parameter the database file to read and write from. The special value NULL
(nullptr
) can be used to create an in-memory database. Note that for an in-memory database no data is persisted to disk (i.e., all data is lost when you exit the process).
With the duckdb_database
handle, you can create one or many duckdb_connection
using duckdb_connect()
. While individual connections are thread-safe, they will be locked during querying. It is therefore recommended that each thread uses its own connection to allow for the best parallel performance.
All duckdb_connection
s have to explicitly be disconnected with duckdb_disconnect()
and the duckdb_database
has to be explicitly closed with duckdb_close()
to avoid memory and file handle leaking.
duckdb_database db;
duckdb_connection con;
if (duckdb_open(NULL, &db) == DuckDBError) {
// handle error
}
if (duckdb_connect(db, &con) == DuckDBError) {
// handle error
}
// run queries...
// cleanup
duckdb_disconnect(&con);
duckdb_close(&db);
duckdb_instance_cache duckdb_create_instance_cache();
duckdb_state duckdb_get_or_create_from_cache(duckdb_instance_cache instance_cache, const char *path, duckdb_database *out_database, duckdb_config config, char **out_error);
void duckdb_destroy_instance_cache(duckdb_instance_cache *instance_cache);
duckdb_state duckdb_open(const char *path, duckdb_database *out_database);
duckdb_state duckdb_open_ext(const char *path, duckdb_database *out_database, duckdb_config config, char **out_error);
void duckdb_close(duckdb_database *database);
duckdb_state duckdb_connect(duckdb_database database, duckdb_connection *out_connection);
void duckdb_interrupt(duckdb_connection connection);
duckdb_query_progress_type duckdb_query_progress(duckdb_connection connection);
void duckdb_disconnect(duckdb_connection *connection);
const char *duckdb_library_version();
Creates a new database instance cache. The instance cache is necessary if a client/program (re)opens multiple databases to the same file within the same process. Must be destroyed with 'duckdb_destroy_instance_cache'.
The database instance cache.
duckdb_instance_cache duckdb_create_instance_cache(
);
Creates a new database instance in the instance cache, or retrieves an existing database instance. Must be closed with 'duckdb_close'.
duckdb_state duckdb_get_or_create_from_cache(
duckdb_instance_cache instance_cache,
const char *path,
duckdb_database *out_database,
duckdb_config config,
char **out_error
);
instance_cache
: The instance cache in which to create the database, or from which to take the database.path
: Path to the database file on disk. Bothnullptr
and:memory:
open or retrieve an in-memory database.out_database
: The resulting cached database.config
: (Optional) configuration used to create the database.out_error
: If set and the function returnsDuckDBError
, this contains the error message. Note that the error message must be freed usingduckdb_free
.
DuckDBSuccess
on success or DuckDBError
on failure.
Destroys an existing database instance cache and de-allocates its memory.
void duckdb_destroy_instance_cache(
duckdb_instance_cache *instance_cache
);
instance_cache
: The instance cache to destroy.
Creates a new database or opens an existing database file stored at the given path. If no path is given a new in-memory database is created instead. The database must be closed with 'duckdb_close'.
duckdb_state duckdb_open(
const char *path,
duckdb_database *out_database
);
path
: Path to the database file on disk. Bothnullptr
and:memory:
open an in-memory database.out_database
: The result database object.
DuckDBSuccess
on success or DuckDBError
on failure.
Extended version of duckdb_open. Creates a new database or opens an existing database file stored at the given path. The database must be closed with 'duckdb_close'.
duckdb_state duckdb_open_ext(
const char *path,
duckdb_database *out_database,
duckdb_config config,
char **out_error
);
path
: Path to the database file on disk. Bothnullptr
and:memory:
open an in-memory database.out_database
: The result database object.config
: (Optional) configuration used to start up the database.out_error
: If set and the function returnsDuckDBError
, this contains the error message. Note that the error message must be freed usingduckdb_free
.
DuckDBSuccess
on success or DuckDBError
on failure.
Closes the specified database and de-allocates all memory allocated for that database.
This should be called after you are done with any database allocated through duckdb_open
or duckdb_open_ext
.
Note that failing to call duckdb_close
(in case of e.g., a program crash) will not cause data corruption.
Still, it is recommended to always correctly close a database object after you are done with it.
void duckdb_close(
duckdb_database *database
);
database
: The database object to shut down.
Opens a connection to a database. Connections are required to query the database, and store transactional state associated with the connection. The instantiated connection should be closed using 'duckdb_disconnect'.
duckdb_state duckdb_connect(
duckdb_database database,
duckdb_connection *out_connection
);
database
: The database file to connect to.out_connection
: The result connection object.
DuckDBSuccess
on success or DuckDBError
on failure.
Interrupt running query
void duckdb_interrupt(
duckdb_connection connection
);
connection
: The connection to interrupt
Get progress of the running query
duckdb_query_progress_type duckdb_query_progress(
duckdb_connection connection
);
connection
: The working connection
-1 if no progress or a percentage of the progress
Closes the specified connection and de-allocates all memory allocated for that connection.
void duckdb_disconnect(
duckdb_connection *connection
);
connection
: The connection to close.
Returns the version of the linked DuckDB, with a version postfix for dev versions
Usually used for developing C extensions that must return this for a compatibility check.
const char *duckdb_library_version(
);
--- layout: docu title: Vectors redirect_from: - /docs/api/c/vector - /docs/api/c/vector/ ---
Vectors represent a horizontal slice of a column. They hold a number of values of a specific type, similar to an array. Vectors are the core data representation used in DuckDB. Vectors are typically stored within [data chunks]({% link docs/clients/c/data_chunk.md %}).
The vector and data chunk interfaces are the most efficient way of interacting with DuckDB, allowing for the highest performance. However, the interfaces are also difficult to use and care must be taken when using them.
Vectors are arrays of a specific data type. The logical type of a vector can be obtained using duckdb_vector_get_column_type
. The type id of the logical type can then be obtained using duckdb_get_type_id
.
Vectors themselves do not have sizes. Instead, the parent data chunk has a size (that can be obtained through duckdb_data_chunk_get_size
). All vectors that belong to a data chunk have the same size.
For primitive types, the underlying array can be obtained using the duckdb_vector_get_data
method. The array can then be accessed using the correct native type. Below is a table that contains a mapping of the duckdb_type
to the native type of the array.
duckdb_type | NativeType |
---|---|
DUCKDB_TYPE_BOOLEAN | bool |
DUCKDB_TYPE_TINYINT | int8_t |
DUCKDB_TYPE_SMALLINT | int16_t |
DUCKDB_TYPE_INTEGER | int32_t |
DUCKDB_TYPE_BIGINT | int64_t |
DUCKDB_TYPE_UTINYINT | uint8_t |
DUCKDB_TYPE_USMALLINT | uint16_t |
DUCKDB_TYPE_UINTEGER | uint32_t |
DUCKDB_TYPE_UBIGINT | uint64_t |
DUCKDB_TYPE_FLOAT | float |
DUCKDB_TYPE_DOUBLE | double |
DUCKDB_TYPE_TIMESTAMP | duckdb_timestamp |
DUCKDB_TYPE_DATE | duckdb_date |
DUCKDB_TYPE_TIME | duckdb_time |
DUCKDB_TYPE_INTERVAL | duckdb_interval |
DUCKDB_TYPE_HUGEINT | duckdb_hugeint |
DUCKDB_TYPE_UHUGEINT | duckdb_uhugeint |
DUCKDB_TYPE_VARCHAR | duckdb_string_t |
DUCKDB_TYPE_BLOB | duckdb_string_t |
DUCKDB_TYPE_TIMESTAMP_S | duckdb_timestamp |
DUCKDB_TYPE_TIMESTAMP_MS | duckdb_timestamp |
DUCKDB_TYPE_TIMESTAMP_NS | duckdb_timestamp |
DUCKDB_TYPE_UUID | duckdb_hugeint |
DUCKDB_TYPE_TIME_TZ | duckdb_time_tz |
DUCKDB_TYPE_TIMESTAMP_TZ | duckdb_timestamp |
Any value in a vector can be NULL
. When a value is NULL
, the values contained within the primary array at that index is undefined (and can be uninitialized). The validity mask is a bitmask consisting of uint64_t
elements. For every 64
values in the vector, one uint64_t
element exists (rounded up). The validity mask has its bit set to 1 if the value is valid, or set to 0 if the value is invalid (i.e .NULL
).
The bits of the bitmask can be read directly, or the slower helper method duckdb_validity_row_is_valid
can be used to check whether or not a value is NULL
.
The duckdb_vector_get_validity
returns a pointer to the validity mask. Note that if all values in a vector are valid, this function might return nullptr
in which case the validity mask does not need to be checked.
String values are stored as a duckdb_string_t
. This is a special struct that stores the string inline (if it is short, i.e., <= 12 bytes
) or a pointer to the string data if it is longer than 12
bytes.
typedef struct {
union {
struct {
uint32_t length;
char prefix[4];
char *ptr;
} pointer;
struct {
uint32_t length;
char inlined[12];
} inlined;
} value;
} duckdb_string_t;
The length can either be accessed directly, or the duckdb_string_is_inlined
can be used to check if a string is inlined.
Decimals are stored as integer values internally. The exact native type depends on the width
of the decimal type, as shown in the following table:
Width | NativeType |
---|---|
<= 4 | int16_t |
<= 9 | int32_t |
<= 18 | int64_t |
<= 38 | duckdb_hugeint |
The duckdb_decimal_internal_type
can be used to obtain the internal type of the decimal.
Decimals are stored as integer values multiplied by 10^scale
. The scale of a decimal can be obtained using duckdb_decimal_scale
. For example, a decimal value of 10.5
with type DECIMAL(8, 3)
is stored internally as an int32_t
value of 10500
. In order to obtain the correct decimal value, the value should be divided by the appropriate power-of-ten.
Enums are stored as unsigned integer values internally. The exact native type depends on the size of the enum dictionary, as shown in the following table:
Dictionary size | NativeType |
---|---|
<= 255 | uint8_t |
<= 65535 | uint16_t |
<= 4294967295 | uint32_t |
The duckdb_enum_internal_type
can be used to obtain the internal type of the enum.
In order to obtain the actual string value of the enum, the duckdb_enum_dictionary_value
function must be used to obtain the enum value that corresponds to the given dictionary entry. Note that the enum dictionary is the same for the entire column – and so only needs to be constructed once.
Structs are nested types that contain any number of child types. Think of them like a struct
in C. The way to access struct data using vectors is to access the child vectors recursively using the duckdb_struct_vector_get_child
method.
The struct vector itself does not have any data (i.e., you should not use duckdb_vector_get_data
method on the struct). However, the struct vector itself does have a validity mask. The reason for this is that the child elements of a struct can be NULL
, but the struct itself can also be NULL
.
Lists are nested types that contain a single child type, repeated x
times per row. Think of them like a variable-length array in C. The way to access list data using vectors is to access the child vector using the duckdb_list_vector_get_child
method.
The duckdb_vector_get_data
must be used to get the offsets and lengths of the lists stored as duckdb_list_entry
, that can then be applied to the child vector.
typedef struct {
uint64_t offset;
uint64_t length;
} duckdb_list_entry;
Note that both list entries itself and any children stored in the lists can also be NULL
. This must be checked using the validity mask again.
Arrays are nested types that contain a single child type, repeated exactly array_size
times per row. Think of them like a fixed-size array in C. Arrays work exactly the same as lists, except the length and offset of each entry is fixed. The fixed array size can be obtained by using duckdb_array_type_array_size
. The data for entry n
then resides at offset = n * array_size
, and always has length = array_size
.
Note that much like lists, arrays can still be NULL
, which must be checked using the validity mask.
Below are several full end-to-end examples of how to interact with vectors.
duckdb_database db;
duckdb_connection con;
duckdb_open(nullptr, &db);
duckdb_connect(db, &con);
duckdb_result res;
duckdb_query(con, "SELECT CASE WHEN i%2=0 THEN NULL ELSE i END res_col FROM range(10) t(i)", &res);
// iterate until result is exhausted
while (true) {
duckdb_data_chunk result = duckdb_fetch_chunk(res);
if (!result) {
// result is exhausted
break;
}
// get the number of rows from the data chunk
idx_t row_count = duckdb_data_chunk_get_size(result);
// get the first column
duckdb_vector res_col = duckdb_data_chunk_get_vector(result, 0);
// get the native array and the validity mask of the vector
int64_t *vector_data = (int64_t *) duckdb_vector_get_data(res_col);
uint64_t *vector_validity = duckdb_vector_get_validity(res_col);
// iterate over the rows
for (idx_t row = 0; row < row_count; row++) {
if (duckdb_validity_row_is_valid(vector_validity, row)) {
printf("%lld\n", vector_data[row]);
} else {
printf("NULL\n");
}
}
duckdb_destroy_data_chunk(&result);
}
// clean-up
duckdb_destroy_result(&res);
duckdb_disconnect(&con);
duckdb_close(&db);
duckdb_database db;
duckdb_connection con;
duckdb_open(nullptr, &db);
duckdb_connect(db, &con);
duckdb_result res;
duckdb_query(con, "SELECT CASE WHEN i%2=0 THEN CONCAT('short_', i) ELSE CONCAT('longstringprefix', i) END FROM range(10) t(i)", &res);
// iterate until result is exhausted
while (true) {
duckdb_data_chunk result = duckdb_fetch_chunk(res);
if (!result) {
// result is exhausted
break;
}
// get the number of rows from the data chunk
idx_t row_count = duckdb_data_chunk_get_size(result);
// get the first column
duckdb_vector res_col = duckdb_data_chunk_get_vector(result, 0);
// get the native array and the validity mask of the vector
duckdb_string_t *vector_data = (duckdb_string_t *) duckdb_vector_get_data(res_col);
uint64_t *vector_validity = duckdb_vector_get_validity(res_col);
// iterate over the rows
for (idx_t row = 0; row < row_count; row++) {
if (duckdb_validity_row_is_valid(vector_validity, row)) {
duckdb_string_t str = vector_data[row];
if (duckdb_string_is_inlined(str)) {
// use inlined string
printf("%.*s\n", str.value.inlined.length, str.value.inlined.inlined);
} else {
// follow string pointer
printf("%.*s\n", str.value.pointer.length, str.value.pointer.ptr);
}
} else {
printf("NULL\n");
}
}
duckdb_destroy_data_chunk(&result);
}
// clean-up
duckdb_destroy_result(&res);
duckdb_disconnect(&con);
duckdb_close(&db);
duckdb_database db;
duckdb_connection con;
duckdb_open(nullptr, &db);
duckdb_connect(db, &con);
duckdb_result res;
duckdb_query(con, "SELECT CASE WHEN i%5=0 THEN NULL ELSE {'col1': i, 'col2': CASE WHEN i%2=0 THEN NULL ELSE 100 + i * 42 END} END FROM range(10) t(i)", &res);
// iterate until result is exhausted
while (true) {
duckdb_data_chunk result = duckdb_fetch_chunk(res);
if (!result) {
// result is exhausted
break;
}
// get the number of rows from the data chunk
idx_t row_count = duckdb_data_chunk_get_size(result);
// get the struct column
duckdb_vector struct_col = duckdb_data_chunk_get_vector(result, 0);
uint64_t *struct_validity = duckdb_vector_get_validity(struct_col);
// get the child columns of the struct
duckdb_vector col1_vector = duckdb_struct_vector_get_child(struct_col, 0);
int64_t *col1_data = (int64_t *) duckdb_vector_get_data(col1_vector);
uint64_t *col1_validity = duckdb_vector_get_validity(col1_vector);
duckdb_vector col2_vector = duckdb_struct_vector_get_child(struct_col, 1);
int64_t *col2_data = (int64_t *) duckdb_vector_get_data(col2_vector);
uint64_t *col2_validity = duckdb_vector_get_validity(col2_vector);
// iterate over the rows
for (idx_t row = 0; row < row_count; row++) {
if (!duckdb_validity_row_is_valid(struct_validity, row)) {
// entire struct is NULL
printf("NULL\n");
continue;
}
// read col1
printf("{'col1': ");
if (!duckdb_validity_row_is_valid(col1_validity, row)) {
// col1 is NULL
printf("NULL");
} else {
printf("%lld", col1_data[row]);
}
printf(", 'col2': ");
if (!duckdb_validity_row_is_valid(col2_validity, row)) {
// col2 is NULL
printf("NULL");
} else {
printf("%lld", col2_data[row]);
}
printf("}\n");
}
duckdb_destroy_data_chunk(&result);
}
// clean-up
duckdb_destroy_result(&res);
duckdb_disconnect(&con);
duckdb_close(&db);
duckdb_database db;
duckdb_connection con;
duckdb_open(nullptr, &db);
duckdb_connect(db, &con);
duckdb_result res;
duckdb_query(con, "SELECT CASE WHEN i % 5 = 0 THEN NULL WHEN i % 2 = 0 THEN [i, i + 1] ELSE [i * 42, NULL, i * 84] END FROM range(10) t(i)", &res);
// iterate until result is exhausted
while (true) {
duckdb_data_chunk result = duckdb_fetch_chunk(res);
if (!result) {
// result is exhausted
break;
}
// get the number of rows from the data chunk
idx_t row_count = duckdb_data_chunk_get_size(result);
// get the list column
duckdb_vector list_col = duckdb_data_chunk_get_vector(result, 0);
duckdb_list_entry *list_data = (duckdb_list_entry *) duckdb_vector_get_data(list_col);
uint64_t *list_validity = duckdb_vector_get_validity(list_col);
// get the child column of the list
duckdb_vector list_child = duckdb_list_vector_get_child(list_col);
int64_t *child_data = (int64_t *) duckdb_vector_get_data(list_child);
uint64_t *child_validity = duckdb_vector_get_validity(list_child);
// iterate over the rows
for (idx_t row = 0; row < row_count; row++) {
if (!duckdb_validity_row_is_valid(list_validity, row)) {
// entire list is NULL
printf("NULL\n");
continue;
}
// read the list offsets for this row
duckdb_list_entry list = list_data[row];
printf("[");
for (idx_t child_idx = list.offset; child_idx < list.offset + list.length; child_idx++) {
if (child_idx > list.offset) {
printf(", ");
}
if (!duckdb_validity_row_is_valid(child_validity, child_idx)) {
// col1 is NULL
printf("NULL");
} else {
printf("%lld", child_data[child_idx]);
}
}
printf("]\n");
}
duckdb_destroy_data_chunk(&result);
}
// clean-up
duckdb_destroy_result(&res);
duckdb_disconnect(&con);
duckdb_close(&db);
duckdb_logical_type duckdb_vector_get_column_type(duckdb_vector vector);
void *duckdb_vector_get_data(duckdb_vector vector);
uint64_t *duckdb_vector_get_validity(duckdb_vector vector);
void duckdb_vector_ensure_validity_writable(duckdb_vector vector);
void duckdb_vector_assign_string_element(duckdb_vector vector, idx_t index, const char *str);
void duckdb_vector_assign_string_element_len(duckdb_vector vector, idx_t index, const char *str, idx_t str_len);
duckdb_vector duckdb_list_vector_get_child(duckdb_vector vector);
idx_t duckdb_list_vector_get_size(duckdb_vector vector);
duckdb_state duckdb_list_vector_set_size(duckdb_vector vector, idx_t size);
duckdb_state duckdb_list_vector_reserve(duckdb_vector vector, idx_t required_capacity);
duckdb_vector duckdb_struct_vector_get_child(duckdb_vector vector, idx_t index);
duckdb_vector duckdb_array_vector_get_child(duckdb_vector vector);
bool duckdb_validity_row_is_valid(uint64_t *validity, idx_t row);
void duckdb_validity_set_row_validity(uint64_t *validity, idx_t row, bool valid);
void duckdb_validity_set_row_invalid(uint64_t *validity, idx_t row);
void duckdb_validity_set_row_valid(uint64_t *validity, idx_t row);
Retrieves the column type of the specified vector.
The result must be destroyed with duckdb_destroy_logical_type
.
duckdb_logical_type duckdb_vector_get_column_type(
duckdb_vector vector
);
vector
: The vector get the data from
The type of the vector
Retrieves the data pointer of the vector.
The data pointer can be used to read or write values from the vector. How to read or write values depends on the type of the vector.
void *duckdb_vector_get_data(
duckdb_vector vector
);
vector
: The vector to get the data from
The data pointer
Retrieves the validity mask pointer of the specified vector.
If all values are valid, this function MIGHT return NULL!
The validity mask is a bitset that signifies null-ness within the data chunk. It is a series of uint64_t values, where each uint64_t value contains validity for 64 tuples. The bit is set to 1 if the value is valid (i.e., not NULL) or 0 if the value is invalid (i.e., NULL).
Validity of a specific value can be obtained like this:
idx_t entry_idx = row_idx / 64; idx_t idx_in_entry = row_idx % 64; bool is_valid = validity_mask[entry_idx] & (1 << idx_in_entry);
Alternatively, the (slower) duckdb_validity_row_is_valid function can be used.
uint64_t *duckdb_vector_get_validity(
duckdb_vector vector
);
vector
: The vector to get the data from
The pointer to the validity mask, or NULL if no validity mask is present
Ensures the validity mask is writable by allocating it.
After this function is called, duckdb_vector_get_validity
will ALWAYS return non-NULL.
This allows NULL values to be written to the vector, regardless of whether a validity mask was present before.
void duckdb_vector_ensure_validity_writable(
duckdb_vector vector
);
vector
: The vector to alter
Assigns a string element in the vector at the specified location.
void duckdb_vector_assign_string_element(
duckdb_vector vector,
idx_t index,
const char *str
);
vector
: The vector to alterindex
: The row position in the vector to assign the string tostr
: The null-terminated string
Assigns a string element in the vector at the specified location. You may also use this function to assign BLOBs.
void duckdb_vector_assign_string_element_len(
duckdb_vector vector,
idx_t index,
const char *str,
idx_t str_len
);
vector
: The vector to alterindex
: The row position in the vector to assign the string tostr
: The stringstr_len
: The length of the string (in bytes)
Retrieves the child vector of a list vector.
The resulting vector is valid as long as the parent vector is valid.
duckdb_vector duckdb_list_vector_get_child(
duckdb_vector vector
);
vector
: The vector
The child vector
Returns the size of the child vector of the list.
idx_t duckdb_list_vector_get_size(
duckdb_vector vector
);
vector
: The vector
The size of the child list
Sets the total size of the underlying child-vector of a list vector.
duckdb_state duckdb_list_vector_set_size(
duckdb_vector vector,
idx_t size
);
vector
: The list vector.size
: The size of the child list.
The duckdb state. Returns DuckDBError if the vector is nullptr.
Sets the total capacity of the underlying child-vector of a list.
After calling this method, you must call duckdb_vector_get_validity
and duckdb_vector_get_data
to obtain current
data and validity pointers
duckdb_state duckdb_list_vector_reserve(
duckdb_vector vector,
idx_t required_capacity
);
vector
: The list vector.required_capacity
: the total capacity to reserve.
The duckdb state. Returns DuckDBError if the vector is nullptr.
Retrieves the child vector of a struct vector.
The resulting vector is valid as long as the parent vector is valid.
duckdb_vector duckdb_struct_vector_get_child(
duckdb_vector vector,
idx_t index
);
vector
: The vectorindex
: The child index
The child vector
Retrieves the child vector of a array vector.
The resulting vector is valid as long as the parent vector is valid. The resulting vector has the size of the parent vector multiplied by the array size.
duckdb_vector duckdb_array_vector_get_child(
duckdb_vector vector
);
vector
: The vector
The child vector
Returns whether or not a row is valid (i.e., not NULL) in the given validity mask.
bool duckdb_validity_row_is_valid(
uint64_t *validity,
idx_t row
);
validity
: The validity mask, as obtained throughduckdb_vector_get_validity
row
: The row index
true if the row is valid, false otherwise
In a validity mask, sets a specific row to either valid or invalid.
Note that duckdb_vector_ensure_validity_writable
should be called before calling duckdb_vector_get_validity
,
to ensure that there is a validity mask to write to.
void duckdb_validity_set_row_validity(
uint64_t *validity,
idx_t row,
bool valid
);
validity
: The validity mask, as obtained throughduckdb_vector_get_validity
.row
: The row indexvalid
: Whether or not to set the row to valid, or invalid
In a validity mask, sets a specific row to invalid.
Equivalent to duckdb_validity_set_row_validity
with valid set to false.
void duckdb_validity_set_row_invalid(
uint64_t *validity,
idx_t row
);
validity
: The validity maskrow
: The row index
In a validity mask, sets a specific row to valid.
Equivalent to duckdb_validity_set_row_validity
with valid set to true.
void duckdb_validity_set_row_valid(
uint64_t *validity,
idx_t row
);
validity
: The validity maskrow
: The row index
--- layout: docu title: Appender redirect_from: - /docs/api/c/appender - /docs/api/c/appender/ ---
Appenders are the most efficient way of loading data into DuckDB from within the C interface, and are recommended for
fast data loading. The appender is much faster than using prepared statements or individual INSERT INTO
statements.
Appends are made in row-wise format. For every column, a duckdb_append_[type]
call should be made, after which
the row should be finished by calling duckdb_appender_end_row
. After all rows have been appended,
duckdb_appender_destroy
should be used to finalize the appender and clean up the resulting memory.
Note that duckdb_appender_destroy
should always be called on the resulting appender, even if the function returns
DuckDBError
.
duckdb_query(con, "CREATE TABLE people (id INTEGER, name VARCHAR)", NULL);
duckdb_appender appender;
if (duckdb_appender_create(con, NULL, "people", &appender) == DuckDBError) {
// handle error
}
// append the first row (1, Mark)
duckdb_append_int32(appender, 1);
duckdb_append_varchar(appender, "Mark");
duckdb_appender_end_row(appender);
// append the second row (2, Hannes)
duckdb_append_int32(appender, 2);
duckdb_append_varchar(appender, "Hannes");
duckdb_appender_end_row(appender);
// finish appending and flush all the rows to the table
duckdb_appender_destroy(&appender);
duckdb_state duckdb_appender_create(duckdb_connection connection, const char *schema, const char *table, duckdb_appender *out_appender);
duckdb_state duckdb_appender_create_ext(duckdb_connection connection, const char *catalog, const char *schema, const char *table, duckdb_appender *out_appender);
idx_t duckdb_appender_column_count(duckdb_appender appender);
duckdb_logical_type duckdb_appender_column_type(duckdb_appender appender, idx_t col_idx);
const char *duckdb_appender_error(duckdb_appender appender);
duckdb_state duckdb_appender_flush(duckdb_appender appender);
duckdb_state duckdb_appender_close(duckdb_appender appender);
duckdb_state duckdb_appender_destroy(duckdb_appender *appender);
duckdb_state duckdb_appender_add_column(duckdb_appender appender, const char *name);
duckdb_state duckdb_appender_clear_columns(duckdb_appender appender);
duckdb_state duckdb_appender_begin_row(duckdb_appender appender);
duckdb_state duckdb_appender_end_row(duckdb_appender appender);
duckdb_state duckdb_append_default(duckdb_appender appender);
duckdb_state duckdb_append_default_to_chunk(duckdb_appender appender, duckdb_data_chunk chunk, idx_t col, idx_t row);
duckdb_state duckdb_append_bool(duckdb_appender appender, bool value);
duckdb_state duckdb_append_int8(duckdb_appender appender, int8_t value);
duckdb_state duckdb_append_int16(duckdb_appender appender, int16_t value);
duckdb_state duckdb_append_int32(duckdb_appender appender, int32_t value);
duckdb_state duckdb_append_int64(duckdb_appender appender, int64_t value);
duckdb_state duckdb_append_hugeint(duckdb_appender appender, duckdb_hugeint value);
duckdb_state duckdb_append_uint8(duckdb_appender appender, uint8_t value);
duckdb_state duckdb_append_uint16(duckdb_appender appender, uint16_t value);
duckdb_state duckdb_append_uint32(duckdb_appender appender, uint32_t value);
duckdb_state duckdb_append_uint64(duckdb_appender appender, uint64_t value);
duckdb_state duckdb_append_uhugeint(duckdb_appender appender, duckdb_uhugeint value);
duckdb_state duckdb_append_float(duckdb_appender appender, float value);
duckdb_state duckdb_append_double(duckdb_appender appender, double value);
duckdb_state duckdb_append_date(duckdb_appender appender, duckdb_date value);
duckdb_state duckdb_append_time(duckdb_appender appender, duckdb_time value);
duckdb_state duckdb_append_timestamp(duckdb_appender appender, duckdb_timestamp value);
duckdb_state duckdb_append_interval(duckdb_appender appender, duckdb_interval value);
duckdb_state duckdb_append_varchar(duckdb_appender appender, const char *val);
duckdb_state duckdb_append_varchar_length(duckdb_appender appender, const char *val, idx_t length);
duckdb_state duckdb_append_blob(duckdb_appender appender, const void *data, idx_t length);
duckdb_state duckdb_append_null(duckdb_appender appender);
duckdb_state duckdb_append_value(duckdb_appender appender, duckdb_value value);
duckdb_state duckdb_append_data_chunk(duckdb_appender appender, duckdb_data_chunk chunk);
Creates an appender object.
Note that the object must be destroyed with duckdb_appender_destroy
.
duckdb_state duckdb_appender_create(
duckdb_connection connection,
const char *schema,
const char *table,
duckdb_appender *out_appender
);
connection
: The connection context to create the appender in.schema
: The schema of the table to append to, ornullptr
for the default schema.table
: The table name to append to.out_appender
: The resulting appender object.
DuckDBSuccess
on success or DuckDBError
on failure.
Creates an appender object.
Note that the object must be destroyed with duckdb_appender_destroy
.
duckdb_state duckdb_appender_create_ext(
duckdb_connection connection,
const char *catalog,
const char *schema,
const char *table,
duckdb_appender *out_appender
);
connection
: The connection context to create the appender in.catalog
: The catalog of the table to append to, ornullptr
for the default catalog.schema
: The schema of the table to append to, ornullptr
for the default schema.table
: The table name to append to.out_appender
: The resulting appender object.
DuckDBSuccess
on success or DuckDBError
on failure.
Returns the number of columns that belong to the appender. If there is no active column list, then this equals the table's physical columns.
idx_t duckdb_appender_column_count(
duckdb_appender appender
);
appender
: The appender to get the column count from.
The number of columns in the data chunks.
Returns the type of the column at the specified index. This is either a type in the active column list, or the same type as a column in the receiving table.
Note: The resulting type must be destroyed with duckdb_destroy_logical_type
.
duckdb_logical_type duckdb_appender_column_type(
duckdb_appender appender,
idx_t col_idx
);
appender
: The appender to get the column type from.col_idx
: The index of the column to get the type of.
The duckdb_logical_type
of the column.
Returns the error message associated with the given appender.
If the appender has no error message, this returns nullptr
instead.
The error message should not be freed. It will be de-allocated when duckdb_appender_destroy
is called.
const char *duckdb_appender_error(
duckdb_appender appender
);
appender
: The appender to get the error from.
The error message, or nullptr
if there is none.
Flush the appender to the table, forcing the cache of the appender to be cleared. If flushing the data triggers a constraint violation or any other error, then all data is invalidated, and this function returns DuckDBError. It is not possible to append more values. Call duckdb_appender_error to obtain the error message followed by duckdb_appender_destroy to destroy the invalidated appender.
duckdb_state duckdb_appender_flush(
duckdb_appender appender
);
appender
: The appender to flush.
DuckDBSuccess
on success or DuckDBError
on failure.
Closes the appender by flushing all intermediate states and closing it for further appends. If flushing the data triggers a constraint violation or any other error, then all data is invalidated, and this function returns DuckDBError. Call duckdb_appender_error to obtain the error message followed by duckdb_appender_destroy to destroy the invalidated appender.
duckdb_state duckdb_appender_close(
duckdb_appender appender
);
appender
: The appender to flush and close.
DuckDBSuccess
on success or DuckDBError
on failure.
Closes the appender by flushing all intermediate states to the table and destroying it. By destroying it, this function de-allocates all memory associated with the appender. If flushing the data triggers a constraint violation, then all data is invalidated, and this function returns DuckDBError. Due to the destruction of the appender, it is no longer possible to obtain the specific error message with duckdb_appender_error. Therefore, call duckdb_appender_close before destroying the appender, if you need insights into the specific error.
duckdb_state duckdb_appender_destroy(
duckdb_appender *appender
);
appender
: The appender to flush, close and destroy.
DuckDBSuccess
on success or DuckDBError
on failure.
Appends a column to the active column list of the appender. Immediately flushes all previous data.
The active column list specifies all columns that are expected when flushing the data. Any non-active columns are filled with their default values, or NULL.
duckdb_state duckdb_appender_add_column(
duckdb_appender appender,
const char *name
);
appender
: The appender to add the column to.
DuckDBSuccess
on success or DuckDBError
on failure.
Removes all columns from the active column list of the appender, resetting the appender to treat all columns as active. Immediately flushes all previous data.
duckdb_state duckdb_appender_clear_columns(
duckdb_appender appender
);
appender
: The appender to clear the columns from.
DuckDBSuccess
on success or DuckDBError
on failure.
A nop function, provided for backwards compatibility reasons. Does nothing. Only duckdb_appender_end_row
is required.
duckdb_state duckdb_appender_begin_row(
duckdb_appender appender
);
Finish the current row of appends. After end_row is called, the next row can be appended.
duckdb_state duckdb_appender_end_row(
duckdb_appender appender
);
appender
: The appender.
DuckDBSuccess
on success or DuckDBError
on failure.
Append a DEFAULT value (NULL if DEFAULT not available for column) to the appender.
duckdb_state duckdb_append_default(
duckdb_appender appender
);
Append a DEFAULT value, at the specified row and column, (NULL if DEFAULT not available for column) to the chunk created from the specified appender. The default value of the column must be a constant value. Non-deterministic expressions like nextval('seq') or random() are not supported.
duckdb_state duckdb_append_default_to_chunk(
duckdb_appender appender,
duckdb_data_chunk chunk,
idx_t col,
idx_t row
);
appender
: The appender to get the default value from.chunk
: The data chunk to append the default value to.col
: The chunk column index to append the default value to.row
: The chunk row index to append the default value to.
DuckDBSuccess
on success or DuckDBError
on failure.
Append a bool value to the appender.
duckdb_state duckdb_append_bool(
duckdb_appender appender,
bool value
);
Append an int8_t value to the appender.
duckdb_state duckdb_append_int8(
duckdb_appender appender,
int8_t value
);
Append an int16_t value to the appender.
duckdb_state duckdb_append_int16(
duckdb_appender appender,
int16_t value
);
Append an int32_t value to the appender.
duckdb_state duckdb_append_int32(
duckdb_appender appender,
int32_t value
);
Append an int64_t value to the appender.
duckdb_state duckdb_append_int64(
duckdb_appender appender,
int64_t value
);
Append a duckdb_hugeint value to the appender.
duckdb_state duckdb_append_hugeint(
duckdb_appender appender,
duckdb_hugeint value
);
Append a uint8_t value to the appender.
duckdb_state duckdb_append_uint8(
duckdb_appender appender,
uint8_t value
);
Append a uint16_t value to the appender.
duckdb_state duckdb_append_uint16(
duckdb_appender appender,
uint16_t value
);
Append a uint32_t value to the appender.
duckdb_state duckdb_append_uint32(
duckdb_appender appender,
uint32_t value
);
Append a uint64_t value to the appender.
duckdb_state duckdb_append_uint64(
duckdb_appender appender,
uint64_t value
);
Append a duckdb_uhugeint value to the appender.
duckdb_state duckdb_append_uhugeint(
duckdb_appender appender,
duckdb_uhugeint value
);
Append a float value to the appender.
duckdb_state duckdb_append_float(
duckdb_appender appender,
float value
);
Append a double value to the appender.
duckdb_state duckdb_append_double(
duckdb_appender appender,
double value
);
Append a duckdb_date value to the appender.
duckdb_state duckdb_append_date(
duckdb_appender appender,
duckdb_date value
);
Append a duckdb_time value to the appender.
duckdb_state duckdb_append_time(
duckdb_appender appender,
duckdb_time value
);
Append a duckdb_timestamp value to the appender.
duckdb_state duckdb_append_timestamp(
duckdb_appender appender,
duckdb_timestamp value
);
Append a duckdb_interval value to the appender.
duckdb_state duckdb_append_interval(
duckdb_appender appender,
duckdb_interval value
);
Append a varchar value to the appender.
duckdb_state duckdb_append_varchar(
duckdb_appender appender,
const char *val
);
Append a varchar value to the appender.
duckdb_state duckdb_append_varchar_length(
duckdb_appender appender,
const char *val,
idx_t length
);
Append a blob value to the appender.
duckdb_state duckdb_append_blob(
duckdb_appender appender,
const void *data,
idx_t length
);
Append a NULL value to the appender (of any type).
duckdb_state duckdb_append_null(
duckdb_appender appender
);
Append a duckdb_value to the appender.
duckdb_state duckdb_append_value(
duckdb_appender appender,
duckdb_value value
);
Appends a pre-filled data chunk to the specified appender. Attempts casting, if the data chunk types do not match the active appender types.
duckdb_state duckdb_append_data_chunk(
duckdb_appender appender,
duckdb_data_chunk chunk
);
appender
: The appender to append to.chunk
: The data chunk to append.
DuckDBSuccess
on success or DuckDBError
on failure.
--- layout: docu title: Replacement Scans redirect_from: - /docs/api/c/replacement_scans - /docs/api/c/replacement_scans/ ---
The replacement scan API can be used to register a callback that is called when a table is read that does not exist in the catalog. For example, when a query such as SELECT * FROM my_table
is executed and my_table
does not exist, the replacement scan callback will be called with my_table
as parameter. The replacement scan can then insert a table function with a specific parameter to replace the read of the table.
void duckdb_add_replacement_scan(duckdb_database db, duckdb_replacement_callback_t replacement, void *extra_data, duckdb_delete_callback_t delete_callback);
void duckdb_replacement_scan_set_function_name(duckdb_replacement_scan_info info, const char *function_name);
void duckdb_replacement_scan_add_parameter(duckdb_replacement_scan_info info, duckdb_value parameter);
void duckdb_replacement_scan_set_error(duckdb_replacement_scan_info info, const char *error);
Add a replacement scan definition to the specified database.
void duckdb_add_replacement_scan(
duckdb_database db,
duckdb_replacement_callback_t replacement,
void *extra_data,
duckdb_delete_callback_t delete_callback
);
db
: The database object to add the replacement scan toreplacement
: The replacement scan callbackextra_data
: Extra data that is passed back into the specified callbackdelete_callback
: The delete callback to call on the extra data, if any
Sets the replacement function name. If this function is called in the replacement callback, the replacement scan is performed. If it is not called, the replacement callback is not performed.
void duckdb_replacement_scan_set_function_name(
duckdb_replacement_scan_info info,
const char *function_name
);
info
: The info objectfunction_name
: The function name to substitute.
Adds a parameter to the replacement scan function.
void duckdb_replacement_scan_add_parameter(
duckdb_replacement_scan_info info,
duckdb_value parameter
);
info
: The info objectparameter
: The parameter to add.
Report that an error has occurred while executing the replacement scan.
void duckdb_replacement_scan_set_error(
duckdb_replacement_scan_info info,
const char *error
);
info
: The info objecterror
: The error message
--- layout: docu title: Query redirect_from: - /docs/api/c/query - /docs/api/c/query/ ---
The duckdb_query
method allows SQL queries to be run in DuckDB from C. This method takes two parameters, a (null-terminated) SQL query string and a duckdb_result
result pointer. The result pointer may be NULL
if the application is not interested in the result set or if the query produces no result. After the result is consumed, the duckdb_destroy_result
method should be used to clean up the result.
Elements can be extracted from the duckdb_result
object using a variety of methods. The duckdb_column_count
can be used to extract the number of columns. duckdb_column_name
and duckdb_column_type
can be used to extract the names and types of individual columns.
duckdb_state state;
duckdb_result result;
// create a table
state = duckdb_query(con, "CREATE TABLE integers (i INTEGER, j INTEGER);", NULL);
if (state == DuckDBError) {
// handle error
}
// insert three rows into the table
state = duckdb_query(con, "INSERT INTO integers VALUES (3, 4), (5, 6), (7, NULL);", NULL);
if (state == DuckDBError) {
// handle error
}
// query rows again
state = duckdb_query(con, "SELECT * FROM integers", &result);
if (state == DuckDBError) {
// handle error
}
// handle the result
// ...
// destroy the result after we are done with it
duckdb_destroy_result(&result);
Values can be extracted using either the duckdb_fetch_chunk
function, or using the duckdb_value
convenience functions. The duckdb_fetch_chunk
function directly hands you data chunks in DuckDB's native array format and can therefore be very fast. The duckdb_value
functions perform bounds- and type-checking, and will automatically cast values to the desired type. This makes them more convenient and easier to use, at the expense of being slower.
See the [Types]({% link docs/clients/c/types.md %}) page for more information.
For optimal performance, use
duckdb_fetch_chunk
to extract data from the query result. Theduckdb_value
functions perform internal type-checking, bounds-checking and casting which makes them slower.
Below is an end-to-end example that prints the above result to CSV format using the duckdb_fetch_chunk
function.
Note that the function is NOT generic: we do need to know exactly what the types of the result columns are.
duckdb_database db;
duckdb_connection con;
duckdb_open(nullptr, &db);
duckdb_connect(db, &con);
duckdb_result res;
duckdb_query(con, "CREATE TABLE integers (i INTEGER, j INTEGER);", NULL);
duckdb_query(con, "INSERT INTO integers VALUES (3, 4), (5, 6), (7, NULL);", NULL);
duckdb_query(con, "SELECT * FROM integers;", &res);
// iterate until result is exhausted
while (true) {
duckdb_data_chunk result = duckdb_fetch_chunk(res);
if (!result) {
// result is exhausted
break;
}
// get the number of rows from the data chunk
idx_t row_count = duckdb_data_chunk_get_size(result);
// get the first column
duckdb_vector col1 = duckdb_data_chunk_get_vector(result, 0);
int32_t *col1_data = (int32_t *) duckdb_vector_get_data(col1);
uint64_t *col1_validity = duckdb_vector_get_validity(col1);
// get the second column
duckdb_vector col2 = duckdb_data_chunk_get_vector(result, 1);
int32_t *col2_data = (int32_t *) duckdb_vector_get_data(col2);
uint64_t *col2_validity = duckdb_vector_get_validity(col2);
// iterate over the rows
for (idx_t row = 0; row < row_count; row++) {
if (duckdb_validity_row_is_valid(col1_validity, row)) {
printf("%d", col1_data[row]);
} else {
printf("NULL");
}
printf(",");
if (duckdb_validity_row_is_valid(col2_validity, row)) {
printf("%d", col2_data[row]);
} else {
printf("NULL");
}
printf("\n");
}
duckdb_destroy_data_chunk(&result);
}
// clean-up
duckdb_destroy_result(&res);
duckdb_disconnect(&con);
duckdb_close(&db);
This prints the following result:
3,4
5,6
7,NULL
Deprecated The
duckdb_value
functions are deprecated and are scheduled for removal in a future release.
Below is an example that prints the above result to CSV format using the duckdb_value_varchar
function.
Note that the function is generic: we do not need to know about the types of the individual result columns.
// print the above result to CSV format using `duckdb_value_varchar`
idx_t row_count = duckdb_row_count(&result);
idx_t column_count = duckdb_column_count(&result);
for (idx_t row = 0; row < row_count; row++) {
for (idx_t col = 0; col < column_count; col++) {
if (col > 0) printf(",");
auto str_val = duckdb_value_varchar(&result, col, row);
printf("%s", str_val);
duckdb_free(str_val);
}
printf("\n");
}
duckdb_state duckdb_query(duckdb_connection connection, const char *query, duckdb_result *out_result);
void duckdb_destroy_result(duckdb_result *result);
const char *duckdb_column_name(duckdb_result *result, idx_t col);
duckdb_type duckdb_column_type(duckdb_result *result, idx_t col);
duckdb_statement_type duckdb_result_statement_type(duckdb_result result);
duckdb_logical_type duckdb_column_logical_type(duckdb_result *result, idx_t col);
idx_t duckdb_column_count(duckdb_result *result);
idx_t duckdb_row_count(duckdb_result *result);
idx_t duckdb_rows_changed(duckdb_result *result);
void *duckdb_column_data(duckdb_result *result, idx_t col);
bool *duckdb_nullmask_data(duckdb_result *result, idx_t col);
const char *duckdb_result_error(duckdb_result *result);
duckdb_error_type duckdb_result_error_type(duckdb_result *result);
Executes a SQL query within a connection and stores the full (materialized) result in the out_result pointer.
If the query fails to execute, DuckDBError is returned and the error message can be retrieved by calling
duckdb_result_error
.
Note that after running duckdb_query
, duckdb_destroy_result
must be called on the result object even if the
query fails, otherwise the error stored within the result will not be freed correctly.
duckdb_state duckdb_query(
duckdb_connection connection,
const char *query,
duckdb_result *out_result
);
connection
: The connection to perform the query in.query
: The SQL query to run.out_result
: The query result.
DuckDBSuccess
on success or DuckDBError
on failure.
Closes the result and de-allocates all memory allocated for that connection.
void duckdb_destroy_result(
duckdb_result *result
);
result
: The result to destroy.
Returns the column name of the specified column. The result should not need to be freed; the column names will automatically be destroyed when the result is destroyed.
Returns NULL
if the column is out of range.
const char *duckdb_column_name(
duckdb_result *result,
idx_t col
);
result
: The result object to fetch the column name from.col
: The column index.
The column name of the specified column.
Returns the column type of the specified column.
Returns DUCKDB_TYPE_INVALID
if the column is out of range.
duckdb_type duckdb_column_type(
duckdb_result *result,
idx_t col
);
result
: The result object to fetch the column type from.col
: The column index.
The column type of the specified column.
Returns the statement type of the statement that was executed
duckdb_statement_type duckdb_result_statement_type(
duckdb_result result
);
result
: The result object to fetch the statement type from.
duckdb_statement_type value or DUCKDB_STATEMENT_TYPE_INVALID
Returns the logical column type of the specified column.
The return type of this call should be destroyed with duckdb_destroy_logical_type
.
Returns NULL
if the column is out of range.
duckdb_logical_type duckdb_column_logical_type(
duckdb_result *result,
idx_t col
);
result
: The result object to fetch the column type from.col
: The column index.
The logical column type of the specified column.
Returns the number of columns present in a the result object.
idx_t duckdb_column_count(
duckdb_result *result
);
result
: The result object.
The number of columns present in the result object.
Warning Deprecation notice. This method is scheduled for removal in a future release.
Returns the number of rows present in the result object.
idx_t duckdb_row_count(
duckdb_result *result
);
result
: The result object.
The number of rows present in the result object.
Returns the number of rows changed by the query stored in the result. This is relevant only for INSERT/UPDATE/DELETE queries. For other queries the rows_changed will be 0.
idx_t duckdb_rows_changed(
duckdb_result *result
);
result
: The result object.
The number of rows changed.
Deprecated This method has been deprecated. Prefer using
duckdb_result_get_chunk
instead.
Returns the data of a specific column of a result in columnar format.
The function returns a dense array which contains the result data. The exact type stored in the array depends on the
corresponding duckdb_type (as provided by duckdb_column_type
). For the exact type by which the data should be
accessed, see the comments in the types section or the DUCKDB_TYPE
enum.
For example, for a column of type DUCKDB_TYPE_INTEGER
, rows can be accessed in the following manner:
int32_t *data = (int32_t *) duckdb_column_data(&result, 0);
printf("Data for row %d: %d\n", row, data[row]);
void *duckdb_column_data(
duckdb_result *result,
idx_t col
);
result
: The result object to fetch the column data from.col
: The column index.
The column data of the specified column.
Deprecated This method has been deprecated. Prefer using
duckdb_result_get_chunk
instead.
Returns the nullmask of a specific column of a result in columnar format. The nullmask indicates for every row
whether or not the corresponding row is NULL
. If a row is NULL
, the values present in the array provided
by duckdb_column_data
are undefined.
int32_t *data = (int32_t *) duckdb_column_data(&result, 0);
bool *nullmask = duckdb_nullmask_data(&result, 0);
if (nullmask[row]) {
printf("Data for row %d: NULL\n", row);
} else {
printf("Data for row %d: %d\n", row, data[row]);
}
bool *duckdb_nullmask_data(
duckdb_result *result,
idx_t col
);
result
: The result object to fetch the nullmask from.col
: The column index.
The nullmask of the specified column.
Returns the error message contained within the result. The error is only set if duckdb_query
returns DuckDBError
.
The result of this function must not be freed. It will be cleaned up when duckdb_destroy_result
is called.
const char *duckdb_result_error(
duckdb_result *result
);
result
: The result object to fetch the error from.
The error of the result.
Returns the result error type contained within the result. The error is only set if duckdb_query
returns
DuckDBError
.
duckdb_error_type duckdb_result_error_type(
duckdb_result *result
);
result
: The result object to fetch the error from.
The error type of the result.
--- layout: docu title: Values redirect_from: - /docs/api/c/value - /docs/api/c/value/ ---
The value class represents a single value of any type.
void duckdb_destroy_value(duckdb_value *value);
duckdb_value duckdb_create_varchar(const char *text);
duckdb_value duckdb_create_varchar_length(const char *text, idx_t length);
duckdb_value duckdb_create_bool(bool input);
duckdb_value duckdb_create_int8(int8_t input);
duckdb_value duckdb_create_uint8(uint8_t input);
duckdb_value duckdb_create_int16(int16_t input);
duckdb_value duckdb_create_uint16(uint16_t input);
duckdb_value duckdb_create_int32(int32_t input);
duckdb_value duckdb_create_uint32(uint32_t input);
duckdb_value duckdb_create_uint64(uint64_t input);
duckdb_value duckdb_create_int64(int64_t val);
duckdb_value duckdb_create_hugeint(duckdb_hugeint input);
duckdb_value duckdb_create_uhugeint(duckdb_uhugeint input);
duckdb_value duckdb_create_varint(duckdb_varint input);
duckdb_value duckdb_create_decimal(duckdb_decimal input);
duckdb_value duckdb_create_float(float input);
duckdb_value duckdb_create_double(double input);
duckdb_value duckdb_create_date(duckdb_date input);
duckdb_value duckdb_create_time(duckdb_time input);
duckdb_value duckdb_create_time_tz_value(duckdb_time_tz value);
duckdb_value duckdb_create_timestamp(duckdb_timestamp input);
duckdb_value duckdb_create_timestamp_tz(duckdb_timestamp input);
duckdb_value duckdb_create_timestamp_s(duckdb_timestamp_s input);
duckdb_value duckdb_create_timestamp_ms(duckdb_timestamp_ms input);
duckdb_value duckdb_create_timestamp_ns(duckdb_timestamp_ns input);
duckdb_value duckdb_create_interval(duckdb_interval input);
duckdb_value duckdb_create_blob(const uint8_t *data, idx_t length);
duckdb_value duckdb_create_bit(duckdb_bit input);
duckdb_value duckdb_create_uuid(duckdb_uhugeint input);
bool duckdb_get_bool(duckdb_value val);
int8_t duckdb_get_int8(duckdb_value val);
uint8_t duckdb_get_uint8(duckdb_value val);
int16_t duckdb_get_int16(duckdb_value val);
uint16_t duckdb_get_uint16(duckdb_value val);
int32_t duckdb_get_int32(duckdb_value val);
uint32_t duckdb_get_uint32(duckdb_value val);
int64_t duckdb_get_int64(duckdb_value val);
uint64_t duckdb_get_uint64(duckdb_value val);
duckdb_hugeint duckdb_get_hugeint(duckdb_value val);
duckdb_uhugeint duckdb_get_uhugeint(duckdb_value val);
duckdb_varint duckdb_get_varint(duckdb_value val);
duckdb_decimal duckdb_get_decimal(duckdb_value val);
float duckdb_get_float(duckdb_value val);
double duckdb_get_double(duckdb_value val);
duckdb_date duckdb_get_date(duckdb_value val);
duckdb_time duckdb_get_time(duckdb_value val);
duckdb_time_tz duckdb_get_time_tz(duckdb_value val);
duckdb_timestamp duckdb_get_timestamp(duckdb_value val);
duckdb_timestamp duckdb_get_timestamp_tz(duckdb_value val);
duckdb_timestamp_s duckdb_get_timestamp_s(duckdb_value val);
duckdb_timestamp_ms duckdb_get_timestamp_ms(duckdb_value val);
duckdb_timestamp_ns duckdb_get_timestamp_ns(duckdb_value val);
duckdb_interval duckdb_get_interval(duckdb_value val);
duckdb_logical_type duckdb_get_value_type(duckdb_value val);
duckdb_blob duckdb_get_blob(duckdb_value val);
duckdb_bit duckdb_get_bit(duckdb_value val);
duckdb_uhugeint duckdb_get_uuid(duckdb_value val);
char *duckdb_get_varchar(duckdb_value value);
duckdb_value duckdb_create_struct_value(duckdb_logical_type type, duckdb_value *values);
duckdb_value duckdb_create_list_value(duckdb_logical_type type, duckdb_value *values, idx_t value_count);
duckdb_value duckdb_create_array_value(duckdb_logical_type type, duckdb_value *values, idx_t value_count);
idx_t duckdb_get_map_size(duckdb_value value);
duckdb_value duckdb_get_map_key(duckdb_value value, idx_t index);
duckdb_value duckdb_get_map_value(duckdb_value value, idx_t index);
bool duckdb_is_null_value(duckdb_value value);
duckdb_value duckdb_create_null_value();
idx_t duckdb_get_list_size(duckdb_value value);
duckdb_value duckdb_get_list_child(duckdb_value value, idx_t index);
duckdb_value duckdb_create_enum_value(duckdb_logical_type type, uint64_t value);
uint64_t duckdb_get_enum_value(duckdb_value value);
duckdb_value duckdb_get_struct_child(duckdb_value value, idx_t index);
Destroys the value and de-allocates all memory allocated for that type.
void duckdb_destroy_value(
duckdb_value *value
);
value
: The value to destroy.
Creates a value from a null-terminated string
duckdb_value duckdb_create_varchar(
const char *text
);
text
: The null-terminated string
The value. This must be destroyed with duckdb_destroy_value
.
Creates a value from a string
duckdb_value duckdb_create_varchar_length(
const char *text,
idx_t length
);
text
: The textlength
: The length of the text
The value. This must be destroyed with duckdb_destroy_value
.
Creates a value from a boolean
duckdb_value duckdb_create_bool(
bool input
);
input
: The boolean value
The value. This must be destroyed with duckdb_destroy_value
.
Creates a value from a int8_t (a tinyint)
duckdb_value duckdb_create_int8(
int8_t input
);
input
: The tinyint value
The value. This must be destroyed with duckdb_destroy_value
.
Creates a value from a uint8_t (a utinyint)
duckdb_value duckdb_create_uint8(
uint8_t input
);
input
: The utinyint value
The value. This must be destroyed with duckdb_destroy_value
.
Creates a value from a int16_t (a smallint)
duckdb_value duckdb_create_int16(
int16_t input
);
input
: The smallint value
The value. This must be destroyed with duckdb_destroy_value
.
Creates a value from a uint16_t (a usmallint)
duckdb_value duckdb_create_uint16(
uint16_t input
);
input
: The usmallint value
The value. This must be destroyed with duckdb_destroy_value
.
Creates a value from a int32_t (an integer)
duckdb_value duckdb_create_int32(
int32_t input
);
input
: The integer value
The value. This must be destroyed with duckdb_destroy_value
.
Creates a value from a uint32_t (a uinteger)
duckdb_value duckdb_create_uint32(
uint32_t input
);
input
: The uinteger value
The value. This must be destroyed with duckdb_destroy_value
.
Creates a value from a uint64_t (a ubigint)
duckdb_value duckdb_create_uint64(
uint64_t input
);
input
: The ubigint value
The value. This must be destroyed with duckdb_destroy_value
.
Creates a value from an int64
The value. This must be destroyed with duckdb_destroy_value
.
duckdb_value duckdb_create_int64(
int64_t val
);
Creates a value from a hugeint
duckdb_value duckdb_create_hugeint(
duckdb_hugeint input
);
input
: The hugeint value
The value. This must be destroyed with duckdb_destroy_value
.
Creates a value from a uhugeint
duckdb_value duckdb_create_uhugeint(
duckdb_uhugeint input
);
input
: The uhugeint value
The value. This must be destroyed with duckdb_destroy_value
.
Creates a VARINT value from a duckdb_varint
duckdb_value duckdb_create_varint(
duckdb_varint input
);
input
: The duckdb_varint value
The value. This must be destroyed with duckdb_destroy_value
.
Creates a DECIMAL value from a duckdb_decimal
duckdb_value duckdb_create_decimal(
duckdb_decimal input
);
input
: The duckdb_decimal value
The value. This must be destroyed with duckdb_destroy_value
.
Creates a value from a float
duckdb_value duckdb_create_float(
float input
);
input
: The float value
The value. This must be destroyed with duckdb_destroy_value
.
Creates a value from a double
duckdb_value duckdb_create_double(
double input
);
input
: The double value
The value. This must be destroyed with duckdb_destroy_value
.
Creates a value from a date
duckdb_value duckdb_create_date(
duckdb_date input
);
input
: The date value
The value. This must be destroyed with duckdb_destroy_value
.
Creates a value from a time
duckdb_value duckdb_create_time(
duckdb_time input
);
input
: The time value
The value. This must be destroyed with duckdb_destroy_value
.
Creates a value from a time_tz.
Not to be confused with duckdb_create_time_tz
, which creates a duckdb_time_tz_t.
duckdb_value duckdb_create_time_tz_value(
duckdb_time_tz value
);
value
: The time_tz value
The value. This must be destroyed with duckdb_destroy_value
.
Creates a TIMESTAMP value from a duckdb_timestamp
duckdb_value duckdb_create_timestamp(
duckdb_timestamp input
);
input
: The duckdb_timestamp value
The value. This must be destroyed with duckdb_destroy_value
.
Creates a TIMESTAMP_TZ value from a duckdb_timestamp
duckdb_value duckdb_create_timestamp_tz(
duckdb_timestamp input
);
input
: The duckdb_timestamp value
The value. This must be destroyed with duckdb_destroy_value
.
Creates a TIMESTAMP_S value from a duckdb_timestamp_s
duckdb_value duckdb_create_timestamp_s(
duckdb_timestamp_s input
);
input
: The duckdb_timestamp_s value
The value. This must be destroyed with duckdb_destroy_value
.
Creates a TIMESTAMP_MS value from a duckdb_timestamp_ms
duckdb_value duckdb_create_timestamp_ms(
duckdb_timestamp_ms input
);
input
: The duckdb_timestamp_ms value
The value. This must be destroyed with duckdb_destroy_value
.
Creates a TIMESTAMP_NS value from a duckdb_timestamp_ns
duckdb_value duckdb_create_timestamp_ns(
duckdb_timestamp_ns input
);
input
: The duckdb_timestamp_ns value
The value. This must be destroyed with duckdb_destroy_value
.
Creates a value from an interval
duckdb_value duckdb_create_interval(
duckdb_interval input
);
input
: The interval value
The value. This must be destroyed with duckdb_destroy_value
.
Creates a value from a blob
duckdb_value duckdb_create_blob(
const uint8_t *data,
idx_t length
);
data
: The blob datalength
: The length of the blob data
The value. This must be destroyed with duckdb_destroy_value
.
Creates a BIT value from a duckdb_bit
duckdb_value duckdb_create_bit(
duckdb_bit input
);
input
: The duckdb_bit value
The value. This must be destroyed with duckdb_destroy_value
.
Creates a UUID value from a uhugeint
duckdb_value duckdb_create_uuid(
duckdb_uhugeint input
);
input
: The duckdb_uhugeint containing the UUID
The value. This must be destroyed with duckdb_destroy_value
.
Returns the boolean value of the given value.
bool duckdb_get_bool(
duckdb_value val
);
val
: A duckdb_value containing a boolean
A boolean, or false if the value cannot be converted
Returns the int8_t value of the given value.
int8_t duckdb_get_int8(
duckdb_value val
);
val
: A duckdb_value containing a tinyint
A int8_t, or MinValue if the value cannot be converted
Returns the uint8_t value of the given value.
uint8_t duckdb_get_uint8(
duckdb_value val
);
val
: A duckdb_value containing a utinyint
A uint8_t, or MinValue if the value cannot be converted
Returns the int16_t value of the given value.
int16_t duckdb_get_int16(
duckdb_value val
);
val
: A duckdb_value containing a smallint
A int16_t, or MinValue if the value cannot be converted
Returns the uint16_t value of the given value.
uint16_t duckdb_get_uint16(
duckdb_value val
);
val
: A duckdb_value containing a usmallint
A uint16_t, or MinValue if the value cannot be converted
Returns the int32_t value of the given value.
int32_t duckdb_get_int32(
duckdb_value val
);
val
: A duckdb_value containing a integer
A int32_t, or MinValue if the value cannot be converted
Returns the uint32_t value of the given value.
uint32_t duckdb_get_uint32(
duckdb_value val
);
val
: A duckdb_value containing a uinteger
A uint32_t, or MinValue if the value cannot be converted
Returns the int64_t value of the given value.
int64_t duckdb_get_int64(
duckdb_value val
);
val
: A duckdb_value containing a bigint
A int64_t, or MinValue if the value cannot be converted
Returns the uint64_t value of the given value.
uint64_t duckdb_get_uint64(
duckdb_value val
);
val
: A duckdb_value containing a ubigint
A uint64_t, or MinValue if the value cannot be converted
Returns the hugeint value of the given value.
duckdb_hugeint duckdb_get_hugeint(
duckdb_value val
);
val
: A duckdb_value containing a hugeint
A duckdb_hugeint, or MinValue if the value cannot be converted
Returns the uhugeint value of the given value.
duckdb_uhugeint duckdb_get_uhugeint(
duckdb_value val
);
val
: A duckdb_value containing a uhugeint
A duckdb_uhugeint, or MinValue if the value cannot be converted
Returns the duckdb_varint value of the given value.
The data
field must be destroyed with duckdb_free
.
duckdb_varint duckdb_get_varint(
duckdb_value val
);
val
: A duckdb_value containing a VARINT
A duckdb_varint. The data
field must be destroyed with duckdb_free
.
Returns the duckdb_decimal value of the given value.
duckdb_decimal duckdb_get_decimal(
duckdb_value val
);
val
: A duckdb_value containing a DECIMAL
A duckdb_decimal, or MinValue if the value cannot be converted
Returns the float value of the given value.
float duckdb_get_float(
duckdb_value val
);
val
: A duckdb_value containing a float
A float, or NAN if the value cannot be converted
Returns the double value of the given value.
double duckdb_get_double(
duckdb_value val
);
val
: A duckdb_value containing a double
A double, or NAN if the value cannot be converted
Returns the date value of the given value.
duckdb_date duckdb_get_date(
duckdb_value val
);
val
: A duckdb_value containing a date
A duckdb_date, or MinValue if the value cannot be converted
Returns the time value of the given value.
duckdb_time duckdb_get_time(
duckdb_value val
);
val
: A duckdb_value containing a time
A duckdb_time, or MinValue if the value cannot be converted
Returns the time_tz value of the given value.
duckdb_time_tz duckdb_get_time_tz(
duckdb_value val
);
val
: A duckdb_value containing a time_tz
A duckdb_time_tz, or MinValue<time_tz> if the value cannot be converted
Returns the TIMESTAMP value of the given value.
duckdb_timestamp duckdb_get_timestamp(
duckdb_value val
);
val
: A duckdb_value containing a TIMESTAMP
A duckdb_timestamp, or MinValue if the value cannot be converted
Returns the TIMESTAMP_TZ value of the given value.
duckdb_timestamp duckdb_get_timestamp_tz(
duckdb_value val
);
val
: A duckdb_value containing a TIMESTAMP_TZ
A duckdb_timestamp, or MinValue<timestamp_tz> if the value cannot be converted
Returns the duckdb_timestamp_s value of the given value.
duckdb_timestamp_s duckdb_get_timestamp_s(
duckdb_value val
);
val
: A duckdb_value containing a TIMESTAMP_S
A duckdb_timestamp_s, or MinValue<timestamp_s> if the value cannot be converted
Returns the duckdb_timestamp_ms value of the given value.
duckdb_timestamp_ms duckdb_get_timestamp_ms(
duckdb_value val
);
val
: A duckdb_value containing a TIMESTAMP_MS
A duckdb_timestamp_ms, or MinValue<timestamp_ms> if the value cannot be converted
Returns the duckdb_timestamp_ns value of the given value.
duckdb_timestamp_ns duckdb_get_timestamp_ns(
duckdb_value val
);
val
: A duckdb_value containing a TIMESTAMP_NS
A duckdb_timestamp_ns, or MinValue<timestamp_ns> if the value cannot be converted
Returns the interval value of the given value.
duckdb_interval duckdb_get_interval(
duckdb_value val
);
val
: A duckdb_value containing a interval
A duckdb_interval, or MinValue if the value cannot be converted
Returns the type of the given value. The type is valid as long as the value is not destroyed. The type itself must not be destroyed.
duckdb_logical_type duckdb_get_value_type(
duckdb_value val
);
val
: A duckdb_value
A duckdb_logical_type.
Returns the blob value of the given value.
duckdb_blob duckdb_get_blob(
duckdb_value val
);
val
: A duckdb_value containing a blob
A duckdb_blob
Returns the duckdb_bit value of the given value.
The data
field must be destroyed with duckdb_free
.
duckdb_bit duckdb_get_bit(
duckdb_value val
);
val
: A duckdb_value containing a BIT
A duckdb_bit
Returns a duckdb_uhugeint representing the UUID value of the given value.
duckdb_uhugeint duckdb_get_uuid(
duckdb_value val
);
val
: A duckdb_value containing a UUID
A duckdb_uhugeint representing the UUID value
Obtains a string representation of the given value.
The result must be destroyed with duckdb_free
.
char *duckdb_get_varchar(
duckdb_value value
);
value
: The value
The string value. This must be destroyed with duckdb_free
.
Creates a struct value from a type and an array of values. Must be destroyed with duckdb_destroy_value
.
duckdb_value duckdb_create_struct_value(
duckdb_logical_type type,
duckdb_value *values
);
type
: The type of the structvalues
: The values for the struct fields
The struct value, or nullptr, if any child type is DUCKDB_TYPE_ANY
or DUCKDB_TYPE_INVALID
.
Creates a list value from a child (element) type and an array of values of length value_count
.
Must be destroyed with duckdb_destroy_value
.
duckdb_value duckdb_create_list_value(
duckdb_logical_type type,
duckdb_value *values,
idx_t value_count
);
type
: The type of the listvalues
: The values for the listvalue_count
: The number of values in the list
The list value, or nullptr, if the child type is DUCKDB_TYPE_ANY
or DUCKDB_TYPE_INVALID
.
Creates an array value from a child (element) type and an array of values of length value_count
.
Must be destroyed with duckdb_destroy_value
.
duckdb_value duckdb_create_array_value(
duckdb_logical_type type,
duckdb_value *values,
idx_t value_count
);
type
: The type of the arrayvalues
: The values for the arrayvalue_count
: The number of values in the array
The array value, or nullptr, if the child type is DUCKDB_TYPE_ANY
or DUCKDB_TYPE_INVALID
.
Returns the number of elements in a MAP value.
idx_t duckdb_get_map_size(
duckdb_value value
);
value
: The MAP value.
The number of elements in the map.
Returns the MAP key at index as a duckdb_value.
duckdb_value duckdb_get_map_key(
duckdb_value value,
idx_t index
);
value
: The MAP value.index
: The index of the key.
The key as a duckdb_value.
Returns the MAP value at index as a duckdb_value.
duckdb_value duckdb_get_map_value(
duckdb_value value,
idx_t index
);
value
: The MAP value.index
: The index of the value.
The value as a duckdb_value.
Returns whether the value's type is SQLNULL or not.
bool duckdb_is_null_value(
duckdb_value value
);
value
: The value to check.
True, if the value's type is SQLNULL, otherwise false.
Creates a value of type SQLNULL.
The duckdb_value representing SQLNULL. This must be destroyed with duckdb_destroy_value
.
duckdb_value duckdb_create_null_value(
);
Returns the number of elements in a LIST value.
idx_t duckdb_get_list_size(
duckdb_value value
);
value
: The LIST value.
The number of elements in the list.
Returns the LIST child at index as a duckdb_value.
duckdb_value duckdb_get_list_child(
duckdb_value value,
idx_t index
);
value
: The LIST value.index
: The index of the child.
The child as a duckdb_value.
Creates an enum value from a type and a value. Must be destroyed with duckdb_destroy_value
.
duckdb_value duckdb_create_enum_value(
duckdb_logical_type type,
uint64_t value
);
type
: The type of the enumvalue
: The value for the enum
The enum value, or nullptr.
Returns the enum value of the given value.
uint64_t duckdb_get_enum_value(
duckdb_value value
);
value
: A duckdb_value containing an enum
A uint64_t, or MinValue if the value cannot be converted
Returns the STRUCT child at index as a duckdb_value.
duckdb_value duckdb_get_struct_child(
duckdb_value value,
idx_t index
);
value
: The STRUCT value.index
: The index of the child.
The child as a duckdb_value.
--- layout: docu title: Data Chunks redirect_from: - /docs/api/c/data_chunk - /docs/api/c/data_chunk/ ---
Data chunks represent a horizontal slice of a table. They hold a number of [vectors]({% link docs/clients/c/vector.md %}), that can each hold up to the VECTOR_SIZE
rows. The vector size can be obtained through the duckdb_vector_size
function and is configurable, but is usually set to 2048
.
Data chunks and vectors are what DuckDB uses natively to store and represent data. For this reason, the data chunk interface is the most efficient way of interfacing with DuckDB. Be aware, however, that correctly interfacing with DuckDB using the data chunk API does require knowledge of DuckDB's internal vector format.
Data chunks can be used in two manners:
- Reading Data: Data chunks can be obtained from query results using the
duckdb_fetch_chunk
method, or as input to a user-defined function. In this case, the [vector methods]({% link docs/clients/c/vector.md %}) can be used to read individual values. - Writing Data: Data chunks can be created using
duckdb_create_data_chunk
. The data chunk can then be filled with values and used induckdb_append_data_chunk
to write data to the database.
The primary manner of interfacing with data chunks is by obtaining the internal vectors of the data chunk using the duckdb_data_chunk_get_vector
method. Afterwards, the [vector methods]({% link docs/clients/c/vector.md %}) can be used to read from or write to the individual vectors.
duckdb_data_chunk duckdb_create_data_chunk(duckdb_logical_type *types, idx_t column_count);
void duckdb_destroy_data_chunk(duckdb_data_chunk *chunk);
void duckdb_data_chunk_reset(duckdb_data_chunk chunk);
idx_t duckdb_data_chunk_get_column_count(duckdb_data_chunk chunk);
duckdb_vector duckdb_data_chunk_get_vector(duckdb_data_chunk chunk, idx_t col_idx);
idx_t duckdb_data_chunk_get_size(duckdb_data_chunk chunk);
void duckdb_data_chunk_set_size(duckdb_data_chunk chunk, idx_t size);
Creates an empty data chunk with the specified column types.
The result must be destroyed with duckdb_destroy_data_chunk
.
duckdb_data_chunk duckdb_create_data_chunk(
duckdb_logical_type *types,
idx_t column_count
);
types
: An array of column types. Column types can not contain ANY and INVALID types.column_count
: The number of columns.
The data chunk.
Destroys the data chunk and de-allocates all memory allocated for that chunk.
void duckdb_destroy_data_chunk(
duckdb_data_chunk *chunk
);
chunk
: The data chunk to destroy.
Resets a data chunk, clearing the validity masks and setting the cardinality of the data chunk to 0.
After calling this method, you must call duckdb_vector_get_validity
and duckdb_vector_get_data
to obtain current
data and validity pointers
void duckdb_data_chunk_reset(
duckdb_data_chunk chunk
);
chunk
: The data chunk to reset.
Retrieves the number of columns in a data chunk.
idx_t duckdb_data_chunk_get_column_count(
duckdb_data_chunk chunk
);
chunk
: The data chunk to get the data from
The number of columns in the data chunk
Retrieves the vector at the specified column index in the data chunk.
The pointer to the vector is valid for as long as the chunk is alive. It does NOT need to be destroyed.
duckdb_vector duckdb_data_chunk_get_vector(
duckdb_data_chunk chunk,
idx_t col_idx
);
chunk
: The data chunk to get the data from
The vector
Retrieves the current number of tuples in a data chunk.
idx_t duckdb_data_chunk_get_size(
duckdb_data_chunk chunk
);
chunk
: The data chunk to get the data from
The number of tuples in the data chunk
Sets the current number of tuples in a data chunk.
void duckdb_data_chunk_set_size(
duckdb_data_chunk chunk,
idx_t size
);
chunk
: The data chunk to set the size insize
: The number of tuples in the data chunk
--- layout: docu title: Prepared Statements redirect_from: - /docs/api/c/prepared - /docs/api/c/prepared/ ---
A prepared statement is a parameterized query. The query is prepared with question marks (?
) or dollar symbols ($1
) indicating the parameters of the query. Values can then be bound to these parameters, after which the prepared statement can be executed using those parameters. A single query can be prepared once and executed many times.
Prepared statements are useful to:
- Easily supply parameters to functions while avoiding string concatenation/SQL injection attacks.
- Speeding up queries that will be executed many times with different parameters.
DuckDB supports prepared statements in the C API with the duckdb_prepare
method. The duckdb_bind
family of functions is used to supply values for subsequent execution of the prepared statement using duckdb_execute_prepared
. After we are done with the prepared statement it can be cleaned up using the duckdb_destroy_prepare
method.
duckdb_prepared_statement stmt;
duckdb_result result;
if (duckdb_prepare(con, "INSERT INTO integers VALUES ($1, $2)", &stmt) == DuckDBError) {
// handle error
}
duckdb_bind_int32(stmt, 1, 42); // the parameter index starts counting at 1!
duckdb_bind_int32(stmt, 2, 43);
// NULL as second parameter means no result set is requested
duckdb_execute_prepared(stmt, NULL);
duckdb_destroy_prepare(&stmt);
// we can also query result sets using prepared statements
if (duckdb_prepare(con, "SELECT * FROM integers WHERE i = ?", &stmt) == DuckDBError) {
// handle error
}
duckdb_bind_int32(stmt, 1, 42);
duckdb_execute_prepared(stmt, &result);
// do something with result
// clean up
duckdb_destroy_result(&result);
duckdb_destroy_prepare(&stmt);
After calling duckdb_prepare
, the prepared statement parameters can be inspected using duckdb_nparams
and duckdb_param_type
. In case the prepare fails, the error can be obtained through duckdb_prepare_error
.
It is not required that the duckdb_bind
family of functions matches the prepared statement parameter type exactly. The values will be auto-cast to the required value as required. For example, calling duckdb_bind_int8
on a parameter type of DUCKDB_TYPE_INTEGER
will work as expected.
Warning Do not use prepared statements to insert large amounts of data into DuckDB. Instead it is recommended to use the [Appender]({% link docs/clients/c/appender.md %}).
duckdb_state duckdb_prepare(duckdb_connection connection, const char *query, duckdb_prepared_statement *out_prepared_statement);
void duckdb_destroy_prepare(duckdb_prepared_statement *prepared_statement);
const char *duckdb_prepare_error(duckdb_prepared_statement prepared_statement);
idx_t duckdb_nparams(duckdb_prepared_statement prepared_statement);
const char *duckdb_parameter_name(duckdb_prepared_statement prepared_statement, idx_t index);
duckdb_type duckdb_param_type(duckdb_prepared_statement prepared_statement, idx_t param_idx);
duckdb_logical_type duckdb_param_logical_type(duckdb_prepared_statement prepared_statement, idx_t param_idx);
duckdb_state duckdb_clear_bindings(duckdb_prepared_statement prepared_statement);
duckdb_statement_type duckdb_prepared_statement_type(duckdb_prepared_statement statement);
Create a prepared statement object from a query.
Note that after calling duckdb_prepare
, the prepared statement should always be destroyed using
duckdb_destroy_prepare
, even if the prepare fails.
If the prepare fails, duckdb_prepare_error
can be called to obtain the reason why the prepare failed.
duckdb_state duckdb_prepare(
duckdb_connection connection,
const char *query,
duckdb_prepared_statement *out_prepared_statement
);
connection
: The connection objectquery
: The SQL query to prepareout_prepared_statement
: The resulting prepared statement object
DuckDBSuccess
on success or DuckDBError
on failure.
Closes the prepared statement and de-allocates all memory allocated for the statement.
void duckdb_destroy_prepare(
duckdb_prepared_statement *prepared_statement
);
prepared_statement
: The prepared statement to destroy.
Returns the error message associated with the given prepared statement.
If the prepared statement has no error message, this returns nullptr
instead.
The error message should not be freed. It will be de-allocated when duckdb_destroy_prepare
is called.
const char *duckdb_prepare_error(
duckdb_prepared_statement prepared_statement
);
prepared_statement
: The prepared statement to obtain the error from.
The error message, or nullptr
if there is none.
Returns the number of parameters that can be provided to the given prepared statement.
Returns 0 if the query was not successfully prepared.
idx_t duckdb_nparams(
duckdb_prepared_statement prepared_statement
);
prepared_statement
: The prepared statement to obtain the number of parameters for.
Returns the name used to identify the parameter
The returned string should be freed using duckdb_free
.
Returns NULL if the index is out of range for the provided prepared statement.
const char *duckdb_parameter_name(
duckdb_prepared_statement prepared_statement,
idx_t index
);
prepared_statement
: The prepared statement for which to get the parameter name from.
Returns the parameter type for the parameter at the given index.
Returns DUCKDB_TYPE_INVALID
if the parameter index is out of range or the statement was not successfully prepared.
duckdb_type duckdb_param_type(
duckdb_prepared_statement prepared_statement,
idx_t param_idx
);
prepared_statement
: The prepared statement.param_idx
: The parameter index.
The parameter type
Returns the logical type for the parameter at the given index.
Returns nullptr
if the parameter index is out of range or the statement was not successfully prepared.
The return type of this call should be destroyed with duckdb_destroy_logical_type
.
duckdb_logical_type duckdb_param_logical_type(
duckdb_prepared_statement prepared_statement,
idx_t param_idx
);
prepared_statement
: The prepared statement.param_idx
: The parameter index.
The logical type of the parameter
Clear the params bind to the prepared statement.
duckdb_state duckdb_clear_bindings(
duckdb_prepared_statement prepared_statement
);
Returns the statement type of the statement to be executed
duckdb_statement_type duckdb_prepared_statement_type(
duckdb_prepared_statement statement
);
statement
: The prepared statement.
duckdb_statement_type value or DUCKDB_STATEMENT_TYPE_INVALID
--- layout: docu title: Search ---
The [Command Line Interface (CLI) client]({% link docs/clients/cli/overview.md %}) is intended for interactive use cases and not for embedding.
As a result, it has more features that could be abused by a malicious actor.
For example, the CLI client has the .sh
feature that allows executing arbitrary shell commands.
This feature is only present in the CLI client and not in any other DuckDB clients.
.sh ls
Tip Calling DuckDB's CLI client via shell commands is not recommended for embedding DuckDB. It is recommended to use one of the client libraries, e.g., [Python]({% link docs/clients/python/overview.md %}), [R]({% link docs/clients/r.md %}), [Java]({% link docs/clients/java.md %}), etc.
The Secrets manager provides a unified user interface for secrets across all backends that use them. Secrets can be scoped, so different storage prefixes can have different secrets, allowing for example to join data across organizations in a single query. Secrets can also be persisted, so that they do not need to be specified every time DuckDB is launched.
Warning Persistent secrets are stored in unencrypted binary format on the disk.
Secrets are typed, their type identifies which service they are for. Most secrets are not included in DuckDB default, instead, they are registered by extensions. Currently, the following secret types are available:
Secret type | Service / protocol | Extension |
---|---|---|
AZURE |
Azure Blob Storage | [azure ]({% link docs/extensions/azure.md %}) |
GCS |
Google Cloud Storage | [httpfs ]({% link docs/extensions/httpfs/s3api.md %}) |
HTTP |
HTTP and HTTPS | [httpfs ]({% link docs/extensions/httpfs/https.md %}) |
HUGGINGFACE |
Hugging Face | [httpfs ]({% link docs/extensions/httpfs/hugging_face.md %}) |
MYSQL |
MySQL | [mysql ]({% link docs/extensions/mysql.md %}) |
POSTGRES |
PostgreSQL | [postgres ]({% link docs/extensions/postgres.md %}) |
R2 |
Cloudflare R2 | [httpfs ]({% link docs/extensions/httpfs/s3api.md %}) |
S3 |
AWS S3 | [httpfs ]({% link docs/extensions/httpfs/s3api.md %}) |
For each type, there are one or more “secret providers” that specify how the secret is created. Secrets can also have an optional scope, which is a file path prefix that the secret applies to. When fetching a secret for a path, the secret scopes are compared to the path, returning the matching secret for the path. In the case of multiple matching secrets, the longest prefix is chosen.
Secrets can be created using the [CREATE SECRET
SQL statement]({% link docs/sql/statements/create_secret.md %}).
Secrets can be temporary or persistent. Temporary secrets are used by default – and are stored in-memory for the life span of the DuckDB instance similar to how settings worked previously. Persistent secrets are stored in unencrypted binary format in the ~/.duckdb/stored_secrets
directory. On startup of DuckDB, persistent secrets are read from this directory and automatically loaded.
To create a secret, a Secret Provider needs to be used. A Secret Provider is a mechanism through which a secret is generated. To illustrate this, for the S3
, GCS
, R2
, and AZURE
secret types, DuckDB currently supports two providers: CONFIG
and CREDENTIAL_CHAIN
. The CONFIG
provider requires the user to pass all configuration information into the CREATE SECRET
, whereas the CREDENTIAL_CHAIN
provider will automatically try to fetch credentials. When no Secret Provider is specified, the CONFIG
provider is used. For more details on how to create secrets using different providers check out the respective pages on [httpfs]({% link docs/extensions/httpfs/overview.md %}#configuration-and-authentication-using-secrets) and [azure]({% link docs/extensions/azure.md %}#authentication-with-secret).
To create a temporary unscoped secret to access S3, we can now use the following:
CREATE SECRET my_secret (
TYPE S3,
KEY_ID 'my_secret_key',
SECRET 'my_secret_value',
REGION 'my_region'
);
Note that we implicitly use the default CONFIG
secret provider here.
In order to persist secrets between DuckDB database instances, we can now use the CREATE PERSISTENT SECRET
command, e.g.:
CREATE PERSISTENT SECRET my_persistent_secret (
TYPE S3,
KEY_ID 'my_secret_key',
SECRET 'my_secret_value'
);
By default, this will write the secret (unencrypted) to the ~/.duckdb/stored_secrets
directory. To change the secrets directory, issue:
SET secret_directory = 'path/to/my_secrets_dir';
Note that setting the value of the home_directory
configuration option has no effect on the location of the secrets.
Secrets can be deleted using the [DROP SECRET
statement]({% link docs/sql/statements/create_secret.md %}#syntax-for-drop-secret), e.g.:
DROP PERSISTENT SECRET my_persistent_secret;
If two secrets exist for a service type, the scope can be used to decide which one should be used. For example:
CREATE SECRET secret1 (
TYPE S3,
KEY_ID 'my_secret_key1',
SECRET 'my_secret_value1',
SCOPE 's3://my-bucket'
);
CREATE SECRET secret2 (
TYPE S3,
KEY_ID 'my_secret_key2',
SECRET 'my_secret_value2',
SCOPE 's3://my-other-bucket'
);
Now, if the user queries something from s3://my-other-bucket/something
, secret secret2
will be chosen automatically for that request. To see which secret is being used, the which_secret
scalar function can be used, which takes a path and a secret type as parameters:
FROM which_secret('s3://my-other-bucket/file.parquet', 's3');
Secrets can be listed using the built-in table-producing function, e.g., by using the [duckdb_secrets()
table function]({% link docs/sql/meta/duckdb_table_functions.md %}#duckdb_secrets):
FROM duckdb_secrets();
layout: docu title: Pragmas redirect_from:
- /docs/sql/pragmas
- /docs/sql/pragmas/
The PRAGMA
statement is a SQL extension adopted by DuckDB from SQLite. PRAGMA
statements can be issued in a similar manner to regular SQL statements. PRAGMA
commands may alter the internal state of the database engine, and can influence the subsequent execution or behavior of the engine.
PRAGMA
statements that assign a value to an option can also be issued using the [SET
statement]({% link docs/sql/statements/set.md %}) and the value of an option can be retrieved using SELECT current_setting(option_name)
.
For DuckDB's built in configuration options, see the [Configuration Reference]({% link docs/configuration/overview.md %}#configuration-reference). DuckDB [extensions]({% link docs/extensions/overview.md %}) may register additional configuration options. These are documented in the respective extensions' documentation pages.
This page contains the supported PRAGMA
settings.
List all databases:
PRAGMA database_list;
List all tables:
PRAGMA show_tables;
List all tables, with extra information, similarly to [DESCRIBE
]({% link docs/guides/meta/describe.md %}):
PRAGMA show_tables_expanded;
To list all functions:
PRAGMA functions;
Get info for a specific table:
PRAGMA table_info('table_name');
CALL pragma_table_info('table_name');
table_info
returns information about the columns of the table with name table_name
. The exact format of the table returned is given below:
cid INTEGER, -- cid of the column
name VARCHAR, -- name of the column
type VARCHAR, -- type of the column
notnull BOOLEAN, -- if the column is marked as NOT NULL
dflt_value VARCHAR, -- default value of the column, or NULL if not specified
pk BOOLEAN -- part of the primary key or not
Get the file and memory size of each database:
PRAGMA database_size;
CALL pragma_database_size();
database_size
returns information about the file and memory size of each database. The column types of the returned results are given below:
database_name VARCHAR, -- database name
database_size VARCHAR, -- total block count times the block size
block_size BIGINT, -- database block size
total_blocks BIGINT, -- total blocks in the database
used_blocks BIGINT, -- used blocks in the database
free_blocks BIGINT, -- free blocks in the database
wal_size VARCHAR, -- write ahead log size
memory_usage VARCHAR, -- memory used by the database buffer manager
memory_limit VARCHAR -- maximum memory allowed for the database
To get storage information:
PRAGMA storage_info('table_name');
CALL pragma_storage_info('table_name');
This call returns the following information for the given table:
Name | Type | Description |
---|---|---|
row_group_id |
BIGINT |
|
column_name |
VARCHAR |
|
column_id |
BIGINT |
|
column_path |
VARCHAR |
|
segment_id |
BIGINT |
|
segment_type |
VARCHAR |
|
start |
BIGINT |
The start row id of this chunk |
count |
BIGINT |
The amount of entries in this storage chunk |
compression |
VARCHAR |
Compression type used for this column – see the [“Lightweight Compression in DuckDB” blog post]({% post_url 2022-10-28-lightweight-compression %}) |
stats |
VARCHAR |
|
has_updates |
BOOLEAN |
|
persistent |
BOOLEAN |
false if temporary table |
block_id |
BIGINT |
Empty unless persistent |
block_offset |
BIGINT |
Empty unless persistent |
See [Storage]({% link docs/internals/storage.md %}) for more information.
The following statement is equivalent to the [SHOW DATABASES
statement]({% link docs/sql/statements/attach.md %}):
PRAGMA show_databases;
Set the memory limit for the buffer manager:
SET memory_limit = '1GB';
Warning The specified memory limit is only applied to the buffer manager. For most queries, the buffer manager handles the majority of the data processed. However, certain in-memory data structures such as [vectors]({% link docs/internals/vector.md %}) and query results are allocated outside of the buffer manager. Additionally, [aggregate functions]({% link docs/sql/functions/aggregates.md %}) with complex state (e.g.,
list
,mode
,quantile
,string_agg
, andapprox
functions) use memory outside of the buffer manager. Therefore, the actual memory consumption can be higher than the specified memory limit.
Set the amount of threads for parallel query execution:
SET threads = 4;
List all available collations:
PRAGMA collations;
Set the default collation to one of the available ones:
SET default_collation = 'nocase';
Set the default ordering for NULLs to be either NULLS_FIRST
, NULLS_LAST
, NULLS_FIRST_ON_ASC_LAST_ON_DESC
or NULLS_LAST_ON_ASC_FIRST_ON_DESC
:
SET default_null_order = 'NULLS_FIRST';
SET default_null_order = 'NULLS_LAST_ON_ASC_FIRST_ON_DESC';
Set the default result set ordering direction to ASCENDING
or DESCENDING
:
SET default_order = 'ASCENDING';
SET default_order = 'DESCENDING';
By default, ordering by non-integer literals is not allowed:
SELECT 42 ORDER BY 'hello world';
-- Binder Error: ORDER BY non-integer literal has no effect.
To allow this behavior, use the order_by_non_integer_literal
option:
SET order_by_non_integer_literal = true;
Prior to version 0.10.0, DuckDB would automatically allow any type to be implicitly cast to VARCHAR
during function binding. As a result it was possible to e.g., compute the substring of an integer without using an explicit cast. For version v0.10.0 and later an explicit cast is needed instead. To revert to the old behavior that performs implicit casting, set the old_implicit_casting
variable to true
:
SET old_implicit_casting = true;
Prior to version 1.1.0, DuckDB's [replacement scan mechanism]({% link docs/clients/c/replacement_scans.md %}) in Python scanned the global Python namespace. To revert to this old behavior, use the following setting:
SET python_scan_all_frames = true;
Show DuckDB version:
PRAGMA version;
CALL pragma_version();
platform
returns an identifier for the platform the current DuckDB executable has been compiled for, e.g., osx_arm64
.
The format of this identifier matches the platform name as described in the [extension loading explainer]({% link docs/extensions/working_with_extensions.md %}#platforms):
PRAGMA platform;
CALL pragma_platform();
The following statement returns the user agent information, e.g., duckdb/v0.10.0(osx_arm64)
:
PRAGMA user_agent;
The following statement returns information on the metadata store (block_id
, total_blocks
, free_blocks
, and free_list
):
PRAGMA metadata_info;
Show progress bar when running queries:
PRAGMA enable_progress_bar;
Or:
PRAGMA enable_print_progress_bar;
Don't show a progress bar for running queries:
PRAGMA disable_progress_bar;
Or:
PRAGMA disable_print_progress_bar;
The output of [EXPLAIN
]({% link docs/sql/statements/profiling.md %}) can be configured to show only the physical plan.
The default configuration of EXPLAIN
:
SET explain_output = 'physical_only';
To only show the optimized query plan:
SET explain_output = 'optimized_only';
To show all query plans:
SET explain_output = 'all';
The following query enables profiling with the default format, query_tree
.
Independent of the format, enable_profiling
is mandatory to enable profiling.
PRAGMA enable_profiling;
PRAGMA enable_profile;
The format of enable_profiling
can be specified as query_tree
, json
, query_tree_optimizer
, or no_output
.
Each format prints its output to the configured output, except no_output
.
The default format is query_tree
.
It prints the physical query plan and the metrics of each operator in the tree.
SET enable_profiling = 'query_tree';
Alternatively, json
returns the physical query plan as JSON:
SET enable_profiling = 'json';
To return the physical query plan, including optimizer and planner metrics:
SET enable_profiling = 'query_tree_optimizer';
Database drivers and other applications can also access profiling information through API calls, in which case users can disable any other output.
Even though the parameter reads no_output
, it is essential to note that this only affects printing to the configurable output.
When accessing profiling information through API calls, it is still crucial to enable profiling:
SET enable_profiling = 'no_output';
By default, DuckDB prints profiling information to the standard output.
However, if you prefer to write the profiling information to a file, you can use PRAGMA
profiling_output
to specify a filepath.
Warning The file contents will be overwritten for every newly issued query. Hence, the file will only contain the profiling information of the last run query:
SET profiling_output = '/path/to/file.json';
SET profile_output = '/path/to/file.json';
By default, a limited amount of profiling information is provided (standard
).
SET profiling_mode = 'standard';
For more details, use the detailed profiling mode by setting profiling_mode
to detailed
.
The output of this mode includes profiling of the planner and optimizer stages.
SET profiling_mode = 'detailed';
By default, profiling enables all metrics except those activated by detailed profiling.
Using the custom_profiling_settings
PRAGMA
, each metric, including those from detailed profiling, can be individually enabled or disabled.
This PRAGMA
accepts a JSON object with metric names as keys and Boolean values to toggle them on or off.
Settings specified by this PRAGMA
override the default behavior.
Note This only affects the metrics when the
enable_profiling
is set tojson
orno_output
. Thequery_tree
andquery_tree_optimizer
always use a default set of metrics.
In the following example, the CPU_TIME
metric is disabled.
The EXTRA_INFO
, OPERATOR_CARDINALITY
, and OPERATOR_TIMING
metrics are enabled.
SET custom_profiling_settings = '{"CPU_TIME": "false", "EXTRA_INFO": "true", "OPERATOR_CARDINALITY": "true", "OPERATOR_TIMING": "true"}';
The profiling documentation contains an overview of the available [metrics]({% link docs/dev/profiling.md %}#metrics).
To disable profiling:
PRAGMA disable_profiling;
PRAGMA disable_profile;
To disable the query optimizer:
PRAGMA disable_optimizer;
To enable the query optimizer:
PRAGMA enable_optimizer;
The disabled_optimizers
option allows selectively disabling optimization steps.
For example, to disable filter_pushdown
and statistics_propagation
, run:
SET disabled_optimizers = 'filter_pushdown,statistics_propagation';
The available optimizations can be queried using the [duckdb_optimizers()
table function]({% link docs/sql/meta/duckdb_table_functions.md %}#duckdb_optimizers).
To re-enable the optimizers, run:
SET disabled_optimizers = '';
Warning The
disabled_optimizers
option should only be used for debugging performance issues and should be avoided in production.
Set a path for query logging:
SET log_query_path = '/tmp/duckdb_log/';
Disable query logging:
SET log_query_path = '';
The create_fts_index
and drop_fts_index
options are only available when the [fts
extension]({% link docs/extensions/full_text_search.md %}) is loaded. Their usage is documented on the [Full-Text Search extension page]({% link docs/extensions/full_text_search.md %}).
Enable verification of external operators:
PRAGMA verify_external;
Disable verification of external operators:
PRAGMA disable_verify_external;
Enable verification of round-trip capabilities for supported logical plans:
PRAGMA verify_serializer;
Disable verification of round-trip capabilities:
PRAGMA disable_verify_serializer;
Enable caching of objects for e.g., Parquet metadata:
PRAGMA enable_object_cache;
Disable caching of objects:
PRAGMA disable_object_cache;
During checkpointing, the existing column data + any new changes get compressed. There exist a couple pragmas to influence which compression functions are considered.
Prefer using this compression method over any other method if possible:
PRAGMA force_compression = 'bitpacking';
Avoid using any of the listed compression methods from the comma separated list:
PRAGMA disabled_compression_methods = 'fsst,rle';
When [CHECKPOINT
]({% link docs/sql/statements/checkpoint.md %}) is called when no changes are made, force a checkpoint regardless:
PRAGMA force_checkpoint;
Run a CHECKPOINT
on successful shutdown and delete the WAL, to leave only a single database file behind:
PRAGMA enable_checkpoint_on_shutdown;
Don't run a CHECKPOINT
on shutdown:
PRAGMA disable_checkpoint_on_shutdown;
By default, DuckDB uses a temporary directory named ⟨database_file_name⟩.tmp
to spill to disk, located in the same directory as the database file. To change this, use:
SET temp_directory = '/path/to/temp_dir.tmp/';
The errors_as_json
option can be set to obtain error information in raw JSON format. For certain errors, extra information or decomposed information is provided for easier machine processing. For example:
SET errors_as_json = true;
Then, running a query that results in an error produces a JSON output:
SELECT * FROM nonexistent_tbl;
{
"exception_type":"Catalog",
"exception_message":"Table with name nonexistent_tbl does not exist!\nDid you mean \"temp.information_schema.tables\"?",
"name":"nonexistent_tbl",
"candidates":"temp.information_schema.tables",
"position":"14",
"type":"Table",
"error_subtype":"MISSING_ENTRY"
}
DuckDB follows IEEE floating-point operation semantics. If you would like to turn this off, run:
SET ieee_floating_point_ops = false;
In this case, floating point division by zero (e.g., 1.0 / 0.0
, 0.0 / 0.0
and -1.0 / 0.0
) will all return NULL
.
The following PRAGMA
s are mostly used for development and internal testing.
Enable query verification:
PRAGMA enable_verification;
Disable query verification:
PRAGMA disable_verification;
Enable force parallel query processing:
PRAGMA verify_parallelism;
Disable force parallel query processing:
PRAGMA disable_verify_parallelism;
When persisting a database to disk, DuckDB writes to a dedicated file containing a list of blocks holding the data. In the case of a file that only holds very little data, e.g., a small table, the default block size of 256KB might not be ideal. Therefore, DuckDB's storage format supports different block sizes.
There are a few constraints on possible block size values.
- Must be a power of two.
- Must be greater or equal to 16384 (16 KB).
- Must be lesser or equal to 262144 (256 KB).
You can set the default block size for all new DuckDB files created by an instance like so:
SET default_block_size = '16384';
It is also possible to set the block size on a per-file basis, see [ATTACH
]({% link docs/sql/statements/attach.md %}) for details.
We designed DuckDB to be easy to deploy and operate. We believe that most users do not need to consult the pages of the operations manual. However, there are certain setups – e.g., when DuckDB is running in mission-critical infrastructure – where we would like to offer advice on how to configure DuckDB. The operations manual contains advice for these cases and also offers convenient configuration snippets such as Gitignore files.
For advice on getting the best performance from DuckDB, see also the [Performance Guide]({% link docs/guides/performance/overview.md %}).
Several operators in DuckDB exhibit non-deterministic behavior. Most notably, SQL uses set semantics, which allows results to be returned in a different order. DuckDB exploits this to improve performance, particularly when performing multi-threaded query execution. Other factors, such as using different compilers, operating systems, and hardware architectures, can also cause changes in ordering. This page documents the cases where non-determinism is an expected behavior. If you would like to make your queries determinisic, see the “Working Around Non-Determinism” section.
One of the most common sources of non-determinism is the set semantics used by SQL. E.g., if you run the following query repeatedly, you may get two different results:
SELECT *
FROM (
SELECT 'A' AS x
UNION
SELECT 'B' AS x
);
Both results A
, B
and B
, A
are correct.
The array_distinct
function may return results in a different order on different platforms:
SELECT array_distinct(['A', 'A', 'B', NULL, NULL]) AS arr;
For this query, both [A, B]
and [B, A]
are valid results.
Floating-point inaccuracies may produce different results when run in a multi-threaded configurations:
For example, stddev
and corr
may produce non-deterministic results:
CREATE TABLE tbl AS
SELECT 'ABCDEFG'[floor(random() * 7 + 1)::INT] AS s, 3.7 AS x, i AS y
FROM range(1, 1_000_000) r(i);
SELECT s, stddev(x) AS standard_deviation, corr(x, y) AS correlation
FROM tbl
GROUP BY s
ORDER BY s;
The expected standard deviations and correlations from this query are 0 for all values of s
.
However, when executed on multiple threads, the query may return small numbers (0 <= z < 10e-16
) due to floating-point inaccuracies.
For the majority of use cases, non-determinism is not causing any issues. However, there are some cases where deterministic results are desirable. In these cases, try the following workarounds:
-
Limit the number of threads to prevent non-determinism introduced by multi-threading.
SET threads = 1;
-
Enforce ordering. For example, you can use the [
ORDER BY ALL
clause]({% link docs/sql/query_syntax/orderby.md %}#order-by-all):SELECT * FROM ( SELECT 'A' AS x UNION SELECT 'B' AS x ) ORDER BY ALL;
You can also sort lists using [
list_sort
]({% link docs/sql/functions/list.md %}#list_sortlist):SELECT list_sort(array_distinct(['A', 'A', 'B', NULL, NULL])) AS i ORDER BY i;
It's also possible to introduce a [deterministic shuffling]({% post_url 2024-08-19-duckdb-tricks-part-1 %}#shuffling-data).
layout: docu title: Types redirect_from:
- /docs/api/c/types
- /docs/api/c/types/
DuckDB is a strongly typed database system. As such, every column has a single type specified. This type is constant
over the entire column. That is to say, a column that is labeled as an INTEGER
column will only contain INTEGER
values.
DuckDB also supports columns of composite types. For example, it is possible to define an array of integers (INTEGER[]
). It is also possible to define types as arbitrary structs (ROW(i INTEGER, j VARCHAR)
). For that reason, native DuckDB type objects are not mere enums, but a class that can potentially be nested.
Types in the C API are modeled using an enum (duckdb_type
) and a complex class (duckdb_logical_type
). For most primitive types, e.g., integers or varchars, the enum is sufficient. For more complex types, such as lists, structs or decimals, the logical type must be used.
typedef enum DUCKDB_TYPE {
DUCKDB_TYPE_INVALID = 0,
DUCKDB_TYPE_BOOLEAN = 1,
DUCKDB_TYPE_TINYINT = 2,
DUCKDB_TYPE_SMALLINT = 3,
DUCKDB_TYPE_INTEGER = 4,
DUCKDB_TYPE_BIGINT = 5,
DUCKDB_TYPE_UTINYINT = 6,
DUCKDB_TYPE_USMALLINT = 7,
DUCKDB_TYPE_UINTEGER = 8,
DUCKDB_TYPE_UBIGINT = 9,
DUCKDB_TYPE_FLOAT = 10,
DUCKDB_TYPE_DOUBLE = 11,
DUCKDB_TYPE_TIMESTAMP = 12,
DUCKDB_TYPE_DATE = 13,
DUCKDB_TYPE_TIME = 14,
DUCKDB_TYPE_INTERVAL = 15,
DUCKDB_TYPE_HUGEINT = 16,
DUCKDB_TYPE_UHUGEINT = 32,
DUCKDB_TYPE_VARCHAR = 17,
DUCKDB_TYPE_BLOB = 18,
DUCKDB_TYPE_DECIMAL = 19,
DUCKDB_TYPE_TIMESTAMP_S = 20,
DUCKDB_TYPE_TIMESTAMP_MS = 21,
DUCKDB_TYPE_TIMESTAMP_NS = 22,
DUCKDB_TYPE_ENUM = 23,
DUCKDB_TYPE_LIST = 24,
DUCKDB_TYPE_STRUCT = 25,
DUCKDB_TYPE_MAP = 26,
DUCKDB_TYPE_ARRAY = 33,
DUCKDB_TYPE_UUID = 27,
DUCKDB_TYPE_UNION = 28,
DUCKDB_TYPE_BIT = 29,
DUCKDB_TYPE_TIME_TZ = 30,
DUCKDB_TYPE_TIMESTAMP_TZ = 31,
} duckdb_type;
The enum type of a column in the result can be obtained using the duckdb_column_type
function. The logical type of a column can be obtained using the duckdb_column_logical_type
function.
The duckdb_value
functions will auto-cast values as required. For example, it is no problem to use
duckdb_value_double
on a column of type duckdb_value_int32
. The value will be auto-cast and returned as a double.
Note that in certain cases the cast may fail. For example, this can happen if we request a duckdb_value_int8
and the value does not fit within an int8
value. In this case, a default value will be returned (usually 0
or nullptr
). The same default value will also be returned if the corresponding value is NULL
.
The duckdb_value_is_null
function can be used to check if a specific value is NULL
or not.
The exception to the auto-cast rule is the duckdb_value_varchar_internal
function. This function does not auto-cast and only works for VARCHAR
columns. The reason this function exists is that the result does not need to be freed.
duckdb_value_varchar
andduckdb_value_blob
require the result to be de-allocated usingduckdb_free
.
The duckdb_fetch_chunk
function can be used to read data chunks from a DuckDB result set, and is the most efficient way of reading data from a DuckDB result using the C API. It is also the only way of reading data of certain types from a DuckDB result. For example, the duckdb_value
functions do not support structural reading of composite types (lists or structs) or more complex types like enums and decimals.
For more information about data chunks, see the [documentation on data chunks]({% link docs/clients/c/data_chunk.md %}).
duckdb_data_chunk duckdb_result_get_chunk(duckdb_result result, idx_t chunk_index);
bool duckdb_result_is_streaming(duckdb_result result);
idx_t duckdb_result_chunk_count(duckdb_result result);
duckdb_result_type duckdb_result_return_type(duckdb_result result);
duckdb_date_struct duckdb_from_date(duckdb_date date);
duckdb_date duckdb_to_date(duckdb_date_struct date);
bool duckdb_is_finite_date(duckdb_date date);
duckdb_time_struct duckdb_from_time(duckdb_time time);
duckdb_time_tz duckdb_create_time_tz(int64_t micros, int32_t offset);
duckdb_time_tz_struct duckdb_from_time_tz(duckdb_time_tz micros);
duckdb_time duckdb_to_time(duckdb_time_struct time);
duckdb_timestamp_struct duckdb_from_timestamp(duckdb_timestamp ts);
duckdb_timestamp duckdb_to_timestamp(duckdb_timestamp_struct ts);
bool duckdb_is_finite_timestamp(duckdb_timestamp ts);
bool duckdb_is_finite_timestamp_s(duckdb_timestamp_s ts);
bool duckdb_is_finite_timestamp_ms(duckdb_timestamp_ms ts);
bool duckdb_is_finite_timestamp_ns(duckdb_timestamp_ns ts);
double duckdb_hugeint_to_double(duckdb_hugeint val);
duckdb_hugeint duckdb_double_to_hugeint(double val);
duckdb_decimal duckdb_double_to_decimal(double val, uint8_t width, uint8_t scale);
double duckdb_decimal_to_double(duckdb_decimal val);
duckdb_logical_type duckdb_create_logical_type(duckdb_type type);
char *duckdb_logical_type_get_alias(duckdb_logical_type type);
void duckdb_logical_type_set_alias(duckdb_logical_type type, const char *alias);
duckdb_logical_type duckdb_create_list_type(duckdb_logical_type type);
duckdb_logical_type duckdb_create_array_type(duckdb_logical_type type, idx_t array_size);
duckdb_logical_type duckdb_create_map_type(duckdb_logical_type key_type, duckdb_logical_type value_type);
duckdb_logical_type duckdb_create_union_type(duckdb_logical_type *member_types, const char **member_names, idx_t member_count);
duckdb_logical_type duckdb_create_struct_type(duckdb_logical_type *member_types, const char **member_names, idx_t member_count);
duckdb_logical_type duckdb_create_enum_type(const char **member_names, idx_t member_count);
duckdb_logical_type duckdb_create_decimal_type(uint8_t width, uint8_t scale);
duckdb_type duckdb_get_type_id(duckdb_logical_type type);
uint8_t duckdb_decimal_width(duckdb_logical_type type);
uint8_t duckdb_decimal_scale(duckdb_logical_type type);
duckdb_type duckdb_decimal_internal_type(duckdb_logical_type type);
duckdb_type duckdb_enum_internal_type(duckdb_logical_type type);
uint32_t duckdb_enum_dictionary_size(duckdb_logical_type type);
char *duckdb_enum_dictionary_value(duckdb_logical_type type, idx_t index);
duckdb_logical_type duckdb_list_type_child_type(duckdb_logical_type type);
duckdb_logical_type duckdb_array_type_child_type(duckdb_logical_type type);
idx_t duckdb_array_type_array_size(duckdb_logical_type type);
duckdb_logical_type duckdb_map_type_key_type(duckdb_logical_type type);
duckdb_logical_type duckdb_map_type_value_type(duckdb_logical_type type);
idx_t duckdb_struct_type_child_count(duckdb_logical_type type);
char *duckdb_struct_type_child_name(duckdb_logical_type type, idx_t index);
duckdb_logical_type duckdb_struct_type_child_type(duckdb_logical_type type, idx_t index);
idx_t duckdb_union_type_member_count(duckdb_logical_type type);
char *duckdb_union_type_member_name(duckdb_logical_type type, idx_t index);
duckdb_logical_type duckdb_union_type_member_type(duckdb_logical_type type, idx_t index);
void duckdb_destroy_logical_type(duckdb_logical_type *type);
duckdb_state duckdb_register_logical_type(duckdb_connection con, duckdb_logical_type type, duckdb_create_type_info info);
Warning Deprecation notice. This method is scheduled for removal in a future release.
Fetches a data chunk from the duckdb_result. This function should be called repeatedly until the result is exhausted.
The result must be destroyed with duckdb_destroy_data_chunk
.
This function supersedes all duckdb_value
functions, as well as the duckdb_column_data
and duckdb_nullmask_data
functions. It results in significantly better performance, and should be preferred in newer code-bases.
If this function is used, none of the other result functions can be used and vice versa (i.e., this function cannot be mixed with the legacy result functions).
Use duckdb_result_chunk_count
to figure out how many chunks there are in the result.
duckdb_data_chunk duckdb_result_get_chunk(
duckdb_result result,
idx_t chunk_index
);
result
: The result object to fetch the data chunk from.chunk_index
: The chunk index to fetch from.
The resulting data chunk. Returns NULL
if the chunk index is out of bounds.
Warning Deprecation notice. This method is scheduled for removal in a future release.
Checks if the type of the internal result is StreamQueryResult.
bool duckdb_result_is_streaming(
duckdb_result result
);
result
: The result object to check.
Whether or not the result object is of the type StreamQueryResult
Warning Deprecation notice. This method is scheduled for removal in a future release.
Returns the number of data chunks present in the result.
idx_t duckdb_result_chunk_count(
duckdb_result result
);
result
: The result object
Number of data chunks present in the result.
Returns the return_type of the given result, or DUCKDB_RETURN_TYPE_INVALID on error
duckdb_result_type duckdb_result_return_type(
duckdb_result result
);
result
: The result object
The return_type
Decompose a duckdb_date
object into year, month and date (stored as duckdb_date_struct
).
duckdb_date_struct duckdb_from_date(
duckdb_date date
);
date
: The date object, as obtained from aDUCKDB_TYPE_DATE
column.
The duckdb_date_struct
with the decomposed elements.
Re-compose a duckdb_date
from year, month and date (duckdb_date_struct
).
duckdb_date duckdb_to_date(
duckdb_date_struct date
);
date
: The year, month and date stored in aduckdb_date_struct
.
The duckdb_date
element.
Test a duckdb_date
to see if it is a finite value.
bool duckdb_is_finite_date(
duckdb_date date
);
date
: The date object, as obtained from aDUCKDB_TYPE_DATE
column.
True if the date is finite, false if it is ±infinity.
Decompose a duckdb_time
object into hour, minute, second and microsecond (stored as duckdb_time_struct
).
duckdb_time_struct duckdb_from_time(
duckdb_time time
);
time
: The time object, as obtained from aDUCKDB_TYPE_TIME
column.
The duckdb_time_struct
with the decomposed elements.
Create a duckdb_time_tz
object from micros and a timezone offset.
duckdb_time_tz duckdb_create_time_tz(
int64_t micros,
int32_t offset
);
micros
: The microsecond component of the time.offset
: The timezone offset component of the time.
The duckdb_time_tz
element.
Decompose a TIME_TZ objects into micros and a timezone offset.
Use duckdb_from_time
to further decompose the micros into hour, minute, second and microsecond.
duckdb_time_tz_struct duckdb_from_time_tz(
duckdb_time_tz micros
);
micros
: The time object, as obtained from aDUCKDB_TYPE_TIME_TZ
column.
Re-compose a duckdb_time
from hour, minute, second and microsecond (duckdb_time_struct
).
duckdb_time duckdb_to_time(
duckdb_time_struct time
);
time
: The hour, minute, second and microsecond in aduckdb_time_struct
.
The duckdb_time
element.
Decompose a duckdb_timestamp
object into a duckdb_timestamp_struct
.
duckdb_timestamp_struct duckdb_from_timestamp(
duckdb_timestamp ts
);
ts
: The ts object, as obtained from aDUCKDB_TYPE_TIMESTAMP
column.
The duckdb_timestamp_struct
with the decomposed elements.
Re-compose a duckdb_timestamp
from a duckdb_timestamp_struct.
duckdb_timestamp duckdb_to_timestamp(
duckdb_timestamp_struct ts
);
ts
: The de-composed elements in aduckdb_timestamp_struct
.
The duckdb_timestamp
element.
Test a duckdb_timestamp
to see if it is a finite value.
bool duckdb_is_finite_timestamp(
duckdb_timestamp ts
);
ts
: The duckdb_timestamp object, as obtained from aDUCKDB_TYPE_TIMESTAMP
column.
True if the timestamp is finite, false if it is ±infinity.
Test a duckdb_timestamp_s
to see if it is a finite value.
bool duckdb_is_finite_timestamp_s(
duckdb_timestamp_s ts
);
ts
: The duckdb_timestamp_s object, as obtained from aDUCKDB_TYPE_TIMESTAMP_S
column.
True if the timestamp is finite, false if it is ±infinity.
Test a duckdb_timestamp_ms
to see if it is a finite value.
bool duckdb_is_finite_timestamp_ms(
duckdb_timestamp_ms ts
);
ts
: The duckdb_timestamp_ms object, as obtained from aDUCKDB_TYPE_TIMESTAMP_MS
column.
True if the timestamp is finite, false if it is ±infinity.
Test a duckdb_timestamp_ns
to see if it is a finite value.
bool duckdb_is_finite_timestamp_ns(
duckdb_timestamp_ns ts
);
ts
: The duckdb_timestamp_ns object, as obtained from aDUCKDB_TYPE_TIMESTAMP_NS
column.
True if the timestamp is finite, false if it is ±infinity.
Converts a duckdb_hugeint object (as obtained from a DUCKDB_TYPE_HUGEINT
column) into a double.
double duckdb_hugeint_to_double(
duckdb_hugeint val
);
val
: The hugeint value.
The converted double
element.
Converts a double value to a duckdb_hugeint object.
If the conversion fails because the double value is too big the result will be 0.
duckdb_hugeint duckdb_double_to_hugeint(
double val
);
val
: The double value.
The converted duckdb_hugeint
element.
Converts a double value to a duckdb_decimal object.
If the conversion fails because the double value is too big, or the width/scale are invalid the result will be 0.
duckdb_decimal duckdb_double_to_decimal(
double val,
uint8_t width,
uint8_t scale
);
val
: The double value.
The converted duckdb_decimal
element.
Converts a duckdb_decimal object (as obtained from a DUCKDB_TYPE_DECIMAL
column) into a double.
double duckdb_decimal_to_double(
duckdb_decimal val
);
val
: The decimal value.
The converted double
element.
Creates a duckdb_logical_type
from a primitive type.
The resulting logical type must be destroyed with duckdb_destroy_logical_type
.
Returns an invalid logical type, if type is: DUCKDB_TYPE_INVALID
, DUCKDB_TYPE_DECIMAL
, DUCKDB_TYPE_ENUM
,
DUCKDB_TYPE_LIST
, DUCKDB_TYPE_STRUCT
, DUCKDB_TYPE_MAP
, DUCKDB_TYPE_ARRAY
, or DUCKDB_TYPE_UNION
.
duckdb_logical_type duckdb_create_logical_type(
duckdb_type type
);
type
: The primitive type to create.
The logical type.
Returns the alias of a duckdb_logical_type, if set, else nullptr
.
The result must be destroyed with duckdb_free
.
char *duckdb_logical_type_get_alias(
duckdb_logical_type type
);
type
: The logical type
The alias or nullptr
Sets the alias of a duckdb_logical_type.
void duckdb_logical_type_set_alias(
duckdb_logical_type type,
const char *alias
);
type
: The logical typealias
: The alias to set
Creates a LIST type from its child type.
The return type must be destroyed with duckdb_destroy_logical_type
.
duckdb_logical_type duckdb_create_list_type(
duckdb_logical_type type
);
type
: The child type of the list
The logical type.
Creates an ARRAY type from its child type.
The return type must be destroyed with duckdb_destroy_logical_type
.
duckdb_logical_type duckdb_create_array_type(
duckdb_logical_type type,
idx_t array_size
);
type
: The child type of the array.array_size
: The number of elements in the array.
The logical type.
Creates a MAP type from its key type and value type.
The return type must be destroyed with duckdb_destroy_logical_type
.
duckdb_logical_type duckdb_create_map_type(
duckdb_logical_type key_type,
duckdb_logical_type value_type
);
key_type
: The map's key type.value_type
: The map's value type.
The logical type.
Creates a UNION type from the passed arrays.
The return type must be destroyed with duckdb_destroy_logical_type
.
duckdb_logical_type duckdb_create_union_type(
duckdb_logical_type *member_types,
const char **member_names,
idx_t member_count
);
member_types
: The array of union member types.member_names
: The union member names.member_count
: The number of union members.
The logical type.
Creates a STRUCT type based on the member types and names.
The resulting type must be destroyed with duckdb_destroy_logical_type
.
duckdb_logical_type duckdb_create_struct_type(
duckdb_logical_type *member_types,
const char **member_names,
idx_t member_count
);
member_types
: The array of types of the struct members.member_names
: The array of names of the struct members.member_count
: The number of members of the struct.
The logical type.
Creates an ENUM type from the passed member name array.
The resulting type should be destroyed with duckdb_destroy_logical_type
.
duckdb_logical_type duckdb_create_enum_type(
const char **member_names,
idx_t member_count
);
member_names
: The array of names that the enum should consist of.member_count
: The number of elements that were specified in the array.
The logical type.
Creates a DECIMAL type with the specified width and scale.
The resulting type should be destroyed with duckdb_destroy_logical_type
.
duckdb_logical_type duckdb_create_decimal_type(
uint8_t width,
uint8_t scale
);
width
: The width of the decimal typescale
: The scale of the decimal type
The logical type.
Retrieves the enum duckdb_type
of a duckdb_logical_type
.
duckdb_type duckdb_get_type_id(
duckdb_logical_type type
);
type
: The logical type.
The duckdb_type
id.
Retrieves the width of a decimal type.
uint8_t duckdb_decimal_width(
duckdb_logical_type type
);
type
: The logical type object
The width of the decimal type
Retrieves the scale of a decimal type.
uint8_t duckdb_decimal_scale(
duckdb_logical_type type
);
type
: The logical type object
The scale of the decimal type
Retrieves the internal storage type of a decimal type.
duckdb_type duckdb_decimal_internal_type(
duckdb_logical_type type
);
type
: The logical type object
The internal type of the decimal type
Retrieves the internal storage type of an enum type.
duckdb_type duckdb_enum_internal_type(
duckdb_logical_type type
);
type
: The logical type object
The internal type of the enum type
Retrieves the dictionary size of the enum type.
uint32_t duckdb_enum_dictionary_size(
duckdb_logical_type type
);
type
: The logical type object
The dictionary size of the enum type
Retrieves the dictionary value at the specified position from the enum.
The result must be freed with duckdb_free
.
char *duckdb_enum_dictionary_value(
duckdb_logical_type type,
idx_t index
);
type
: The logical type objectindex
: The index in the dictionary
The string value of the enum type. Must be freed with duckdb_free
.
Retrieves the child type of the given LIST type. Also accepts MAP types.
The result must be freed with duckdb_destroy_logical_type
.
duckdb_logical_type duckdb_list_type_child_type(
duckdb_logical_type type
);
type
: The logical type, either LIST or MAP.
The child type of the LIST or MAP type.
Retrieves the child type of the given ARRAY type.
The result must be freed with duckdb_destroy_logical_type
.
duckdb_logical_type duckdb_array_type_child_type(
duckdb_logical_type type
);
type
: The logical type. Must be ARRAY.
The child type of the ARRAY type.
Retrieves the array size of the given array type.
idx_t duckdb_array_type_array_size(
duckdb_logical_type type
);
type
: The logical type object
The fixed number of elements the values of this array type can store.
Retrieves the key type of the given map type.
The result must be freed with duckdb_destroy_logical_type
.
duckdb_logical_type duckdb_map_type_key_type(
duckdb_logical_type type
);
type
: The logical type object
The key type of the map type. Must be destroyed with duckdb_destroy_logical_type
.
Retrieves the value type of the given map type.
The result must be freed with duckdb_destroy_logical_type
.
duckdb_logical_type duckdb_map_type_value_type(
duckdb_logical_type type
);
type
: The logical type object
The value type of the map type. Must be destroyed with duckdb_destroy_logical_type
.
Returns the number of children of a struct type.
idx_t duckdb_struct_type_child_count(
duckdb_logical_type type
);
type
: The logical type object
The number of children of a struct type.
Retrieves the name of the struct child.
The result must be freed with duckdb_free
.
char *duckdb_struct_type_child_name(
duckdb_logical_type type,
idx_t index
);
type
: The logical type objectindex
: The child index
The name of the struct type. Must be freed with duckdb_free
.
Retrieves the child type of the given struct type at the specified index.
The result must be freed with duckdb_destroy_logical_type
.
duckdb_logical_type duckdb_struct_type_child_type(
duckdb_logical_type type,
idx_t index
);
type
: The logical type objectindex
: The child index
The child type of the struct type. Must be destroyed with duckdb_destroy_logical_type
.
Returns the number of members that the union type has.
idx_t duckdb_union_type_member_count(
duckdb_logical_type type
);
type
: The logical type (union) object
The number of members of a union type.
Retrieves the name of the union member.
The result must be freed with duckdb_free
.
char *duckdb_union_type_member_name(
duckdb_logical_type type,
idx_t index
);
type
: The logical type objectindex
: The child index
The name of the union member. Must be freed with duckdb_free
.
Retrieves the child type of the given union member at the specified index.
The result must be freed with duckdb_destroy_logical_type
.
duckdb_logical_type duckdb_union_type_member_type(
duckdb_logical_type type,
idx_t index
);
type
: The logical type objectindex
: The child index
The child type of the union member. Must be destroyed with duckdb_destroy_logical_type
.
Destroys the logical type and de-allocates all memory allocated for that type.
void duckdb_destroy_logical_type(
duckdb_logical_type *type
);
type
: The logical type to destroy.
Registers a custom type within the given connection. The type must have an alias
duckdb_state duckdb_register_logical_type(
duckdb_connection con,
duckdb_logical_type type,
duckdb_create_type_info info
);
con
: The connection to usetype
: The custom type to register
Whether or not the registration was successful.
--- layout: docu title: Gitignore for DuckDB ---
If you work in a Git repository, you may want to configure your Gitignore to disable tracking [files created by DuckDB]({% link docs/operations_manual/footprint_of_duckdb/files_created_by_duckdb.md %}). These potentially include the DuckDB database, write ahead log, temporary files.
In the following, we present sample Gitignore configuration snippets for DuckDB.
This configuration is useful if you would like to keep the database file in the version control system:
*.wal
*.tmp/
If you would like to ignore both the database and the temporary files, extend the Gitignore file to include the database file.
The exact Gitignore configuration to achieve this depends on the extension you use for your DuckDB databases (.duckdb
, .db
, .ddb
, etc.).
For example, if your DuckDB files use the .duckdb
extension, add the following lines to your .gitignore
file:
*.duckdb*
*.wal
*.tmp/
This page contains DuckDB's built-in limit values.
Limit | Default value | Configuration option | Comment |
---|---|---|---|
Array size | 100000 | - | |
BLOB size | 4 GB | - | |
Expression depth | 1000 | [max_expression_depth ]({% link docs/configuration/overview.md %}) |
|
Memory allocation for a vector | 128 GB | - | |
Memory use | 80% of RAM | [memory_limit ]({% link docs/configuration/pragmas.md %}#memory-limit) |
Note: This limit only applies to the buffer manager. |
String size | 4 GB | - | |
Temporary directory size | unlimited | [max_temp_directory_size ]({% link docs/configuration/overview.md %}) |
layout: docu title: Configuration redirect_from:
- /docs/configuration
- /docs/configuration/
- /docs/sql/configuration
- /docs/sql/configuration/
DuckDB has a number of configuration options that can be used to change the behavior of the system.
The configuration options can be set using either the [SET
statement]({% link docs/sql/statements/set.md %}) or the [PRAGMA
statement]({% link docs/configuration/pragmas.md %}).
They can be reset to their original values using the [RESET
statement]({% link docs/sql/statements/set.md %}#reset).
The values of configuration options can be queried via the [current_setting()
scalar function]({% link docs/sql/functions/utility.md %}) or using the [duckdb_settings()
table function]({% link docs/sql/meta/duckdb_table_functions.md %}#duckdb_settings). For example:
SELECT current_setting('memory_limit') AS memlimit;
Or:
SELECT value AS memlimit
FROM duckdb_settings()
WHERE name = 'memory_limit';
Set the memory limit of the system to 10 GB.
SET memory_limit = '10GB';
Configure the system to use 1 thread.
SET threads TO 1;
Enable printing of a progress bar during long-running queries.
SET enable_progress_bar = true;
Set the default null order to NULLS LAST
.
SET default_null_order = 'nulls_last';
Return the current value of a specific setting.
SELECT current_setting('threads') AS threads;
threads |
---|
10 |
Query a specific setting.
SELECT *
FROM duckdb_settings()
WHERE name = 'threads';
name | value | description | input_type | scope |
---|---|---|---|---|
threads | 1 | The number of total threads used by the system. | BIGINT | GLOBAL |
Show a list of all available settings.
SELECT *
FROM duckdb_settings();
Reset the memory limit of the system back to the default.
RESET memory_limit;
DuckDB has a [Secrets manager]({% link docs/sql/statements/create_secret.md %}), which provides a unified user interface for secrets across all backends (e.g., AWS S3) that use them.
Configuration options come with different default [scopes]({% link docs/sql/statements/set.md %}#scopes): GLOBAL
and LOCAL
. Below is a list of all available configuration options by scope.
Name | Description | Type | Default value |
---|---|---|---|
Calendar |
The current calendar | VARCHAR |
System (locale) calendar |
TimeZone |
The current time zone | VARCHAR |
System (locale) timezone |
access_mode |
Access mode of the database (AUTOMATIC , READ_ONLY or READ_WRITE ) |
VARCHAR |
automatic |
allocator_background_threads |
Whether to enable the allocator background thread. | BOOLEAN |
false |
allocator_bulk_deallocation_flush_threshold |
If a bulk deallocation larger than this occurs, flush outstanding allocations. | VARCHAR |
512.0 MiB |
allocator_flush_threshold |
Peak allocation threshold at which to flush the allocator after completing a task. | VARCHAR |
128.0 MiB |
allow_community_extensions |
Allow to load community built extensions | BOOLEAN |
true |
allow_extensions_metadata_mismatch |
Allow to load extensions with not compatible metadata | BOOLEAN |
false |
allow_persistent_secrets |
Allow the creation of persistent secrets, that are stored and loaded on restarts | BOOLEAN |
true |
allow_unredacted_secrets |
Allow printing unredacted secrets | BOOLEAN |
false |
allow_unsigned_extensions |
Allow to load extensions with invalid or missing signatures | BOOLEAN |
false |
allowed_directories |
List of directories/prefixes that are ALWAYS allowed to be queried - even when enable_external_access is false | VARCHAR[] |
[] |
allowed_paths |
List of files that are ALWAYS allowed to be queried - even when enable_external_access is false | VARCHAR[] |
[] |
arrow_large_buffer_size |
Whether Arrow buffers for strings, blobs, uuids and bits should be exported using large buffers | BOOLEAN |
false |
arrow_lossless_conversion |
Whenever a DuckDB type does not have a clear native or canonical extension match in Arrow, export the types with a duckdb.type_name extension name. | BOOLEAN |
false |
arrow_output_list_view |
Whether export to Arrow format should use ListView as the physical layout for LIST columns | BOOLEAN |
false |
autoinstall_extension_repository |
Overrides the custom endpoint for extension installation on autoloading | VARCHAR |
|
autoinstall_known_extensions |
Whether known extensions are allowed to be automatically installed when a query depends on them | BOOLEAN |
true |
autoload_known_extensions |
Whether known extensions are allowed to be automatically loaded when a query depends on them | BOOLEAN |
true |
binary_as_string |
In Parquet files, interpret binary data as a string. | BOOLEAN |
|
ca_cert_file |
Path to a custom certificate file for self-signed certificates. | VARCHAR |
|
catalog_error_max_schemas |
The maximum number of schemas the system will scan for "did you mean..." style errors in the catalog | UBIGINT |
100 |
checkpoint_threshold , wal_autocheckpoint |
The WAL size threshold at which to automatically trigger a checkpoint (e.g., 1GB) | VARCHAR |
16.0 MiB |
custom_extension_repository |
Overrides the custom endpoint for remote extension installation | VARCHAR |
|
custom_user_agent |
Metadata from DuckDB callers | VARCHAR |
|
default_block_size |
The default block size for new duckdb database files (new as-in, they do not yet exist). | UBIGINT |
262144 |
default_collation |
The collation setting used when none is specified | VARCHAR |
|
default_null_order , null_order |
NULL ordering used when none is specified (NULLS_FIRST or NULLS_LAST ) |
VARCHAR |
NULLS_LAST |
default_order |
The order type used when none is specified (ASC or DESC ) |
VARCHAR |
ASC |
default_secret_storage |
Allows switching the default storage for secrets | VARCHAR |
local_file |
disable_parquet_prefetching |
Disable the prefetching mechanism in Parquet | BOOLEAN |
false |
disabled_compression_methods |
Disable a specific set of compression methods (comma separated) | VARCHAR |
|
disabled_filesystems |
Disable specific file systems preventing access (e.g., LocalFileSystem) | VARCHAR |
|
disabled_log_types |
Sets the list of disabled loggers | VARCHAR |
|
duckdb_api |
DuckDB API surface | VARCHAR |
cli |
enable_external_access |
Allow the database to access external state (through e.g., loading/installing modules, COPY TO/FROM, CSV readers, pandas replacement scans, etc) | BOOLEAN |
true |
enable_fsst_vectors |
Allow scans on FSST compressed segments to emit compressed vectors to utilize late decompression | BOOLEAN |
false |
enable_geoparquet_conversion |
Attempt to decode/encode geometry data in/as GeoParquet files if the spatial extension is present. | BOOLEAN |
true |
enable_http_metadata_cache |
Whether or not the global http metadata is used to cache HTTP metadata | BOOLEAN |
false |
enable_logging |
Enables the logger | BOOLEAN |
0 |
enable_macro_dependencies |
Enable created MACROs to create dependencies on the referenced objects (such as tables) | BOOLEAN |
false |
enable_object_cache |
[PLACEHOLDER] Legacy setting - does nothing | BOOLEAN |
NULL |
enable_server_cert_verification |
Enable server side certificate verification. | BOOLEAN |
false |
enable_view_dependencies |
Enable created VIEWs to create dependencies on the referenced objects (such as tables) | BOOLEAN |
false |
enabled_log_types |
Sets the list of enabled loggers | VARCHAR |
|
extension_directory |
Set the directory to store extensions in | VARCHAR |
|
external_threads |
The number of external threads that work on DuckDB tasks. | UBIGINT |
1 |
force_download |
Forces upfront download of file | BOOLEAN |
false |
http_keep_alive |
Keep alive connections. Setting this to false can help when running into connection failures | BOOLEAN |
true |
http_proxy_password |
Password for HTTP proxy | VARCHAR |
|
http_proxy_username |
Username for HTTP proxy | VARCHAR |
|
http_proxy |
HTTP proxy host | VARCHAR |
|
http_retries |
HTTP retries on I/O error | UBIGINT |
3 |
http_retry_backoff |
Backoff factor for exponentially increasing retry wait time | FLOAT |
4 |
http_retry_wait_ms |
Time between retries | UBIGINT |
100 |
http_timeout |
HTTP timeout read/write/connection/retry (in seconds) | UBIGINT |
30 |
immediate_transaction_mode |
Whether transactions should be started lazily when needed, or immediately when BEGIN TRANSACTION is called | BOOLEAN |
false |
index_scan_max_count |
The maximum index scan count sets a threshold for index scans. If fewer than MAX(index_scan_max_count, index_scan_percentage * total_row_count) rows match, we perform an index scan instead of a table scan. | UBIGINT |
2048 |
index_scan_percentage |
The index scan percentage sets a threshold for index scans. If fewer than MAX(index_scan_max_count, index_scan_percentage * total_row_count) rows match, we perform an index scan instead of a table scan. | DOUBLE |
0.001 |
lock_configuration |
Whether or not the configuration can be altered | BOOLEAN |
false |
logging_level |
The log level which will be recorded in the log | VARCHAR |
INFO |
logging_mode |
Enables the logger | VARCHAR |
LEVEL_ONLY |
logging_storage |
Set the logging storage (memory/stdout/file) | VARCHAR |
memory |
max_memory , memory_limit |
The maximum memory of the system (e.g., 1GB) | VARCHAR |
80% of RAM |
max_temp_directory_size |
The maximum amount of data stored inside the 'temp_directory' (when set) (e.g., 1GB) | VARCHAR |
90% of available disk space |
max_vacuum_tasks |
The maximum vacuum tasks to schedule during a checkpoint. | UBIGINT |
100 |
old_implicit_casting |
Allow implicit casting to/from VARCHAR | BOOLEAN |
false |
parquet_metadata_cache |
Cache Parquet metadata - useful when reading the same files multiple times | BOOLEAN |
false |
password |
The password to use. Ignored for legacy compatibility. | VARCHAR |
NULL |
prefetch_all_parquet_files |
Use the prefetching mechanism for all types of parquet files | BOOLEAN |
false |
preserve_insertion_order |
Whether or not to preserve insertion order. If set to false the system is allowed to re-order any results that do not contain ORDER BY clauses. | BOOLEAN |
true |
produce_arrow_string_view |
Whether strings should be produced by DuckDB in Utf8View format instead of Utf8 | BOOLEAN |
false |
s3_access_key_id |
S3 Access Key ID | VARCHAR |
|
s3_endpoint |
S3 Endpoint | VARCHAR |
|
s3_region |
S3 Region | VARCHAR |
us-east-1 |
s3_secret_access_key |
S3 Access Key | VARCHAR |
|
s3_session_token |
S3 Session Token | VARCHAR |
|
s3_uploader_max_filesize |
S3 Uploader max filesize (between 50GB and 5TB) | VARCHAR |
800GB |
s3_uploader_max_parts_per_file |
S3 Uploader max parts per file (between 1 and 10000) | UBIGINT |
10000 |
s3_uploader_thread_limit |
S3 Uploader global thread limit | UBIGINT |
50 |
s3_url_compatibility_mode |
Disable Globs and Query Parameters on S3 URLs | BOOLEAN |
false |
s3_url_style |
S3 URL style | VARCHAR |
vhost |
s3_use_ssl |
S3 use SSL | BOOLEAN |
true |
secret_directory |
Set the directory to which persistent secrets are stored | VARCHAR |
~/.duckdb/stored_secrets |
storage_compatibility_version |
Serialize on checkpoint with compatibility for a given duckdb version | VARCHAR |
v0.10.2 |
temp_directory |
Set the directory to which to write temp files | VARCHAR |
⟨database_name⟩.tmp or .tmp (in in-memory mode) |
threads , worker_threads |
The number of total threads used by the system. | BIGINT |
# CPU cores |
username , user |
The username to use. Ignored for legacy compatibility. | VARCHAR |
NULL |
zstd_min_string_length |
The (average) length at which to enable ZSTD compression, defaults to 4096 | UBIGINT |
4096 |
Name | Description | Type | Default value |
---|---|---|---|
custom_profiling_settings |
Accepts a JSON enabling custom metrics |
VARCHAR |
{"ROWS_RETURNED": "true", "LATENCY": "true", "RESULT_SET_SIZE": "true", "OPERATOR_TIMING": "true", "OPERATOR_ROWS_SCANNED": "true", "CUMULATIVE_ROWS_SCANNED": "true", "OPERATOR_CARDINALITY": "true", "OPERATOR_TYPE": "true", "OPERATOR_NAME": "true", "CUMULATIVE_CARDINALITY": "true", "EXTRA_INFO": "true", "CPU_TIME": "true", "BLOCKE... |
dynamic_or_filter_threshold |
The maximum amount of OR filters we generate dynamically from a hash join | UBIGINT |
50 |
enable_http_logging |
Enables HTTP logging | BOOLEAN |
false |
enable_profiling |
Enables profiling, and sets the output format (JSON , QUERY_TREE , QUERY_TREE_OPTIMIZER ) |
VARCHAR |
NULL |
enable_progress_bar_print |
Controls the printing of the progress bar, when 'enable_progress_bar' is true | BOOLEAN |
true |
enable_progress_bar |
Enables the progress bar, printing progress to the terminal for long queries | BOOLEAN |
true |
errors_as_json |
Output error messages as structured JSON instead of as a raw string |
BOOLEAN |
false |
explain_output |
Output of EXPLAIN statements (ALL , OPTIMIZED_ONLY , PHYSICAL_ONLY ) |
VARCHAR |
physical_only |
file_search_path |
A comma separated list of directories to search for input files | VARCHAR |
|
home_directory |
Sets the home directory used by the system | VARCHAR |
|
http_logging_output |
The file to which HTTP logging output should be saved, or empty to print to the terminal | VARCHAR |
|
ieee_floating_point_ops |
Use IEE754-compliant floating point operations (returning NAN instead of errors/NULL). | BOOLEAN |
true |
integer_division |
Whether or not the / operator defaults to integer division, or to floating point division | BOOLEAN |
false |
late_materialization_max_rows |
The maximum amount of rows in the LIMIT/SAMPLE for which we trigger late materialization | UBIGINT |
50 |
log_query_path |
Specifies the path to which queries should be logged (default: NULL, queries are not logged) | VARCHAR |
NULL |
max_expression_depth |
The maximum expression depth limit in the parser. WARNING: increasing this setting and using very deep expressions might lead to stack overflow errors. | UBIGINT |
1000 |
merge_join_threshold |
The number of rows we need on either table to choose a merge join | UBIGINT |
1000 |
nested_loop_join_threshold |
The number of rows we need on either table to choose a nested loop join | UBIGINT |
5 |
order_by_non_integer_literal |
Allow ordering by non-integer literals - ordering by such literals has no effect. | BOOLEAN |
false |
ordered_aggregate_threshold |
The number of rows to accumulate before sorting, used for tuning | UBIGINT |
262144 |
partitioned_write_flush_threshold |
The threshold in number of rows after which we flush a thread state when writing using PARTITION_BY |
UBIGINT |
524288 |
partitioned_write_max_open_files |
The maximum amount of files the system can keep open before flushing to disk when writing using PARTITION_BY |
UBIGINT |
100 |
perfect_ht_threshold |
Threshold in bytes for when to use a perfect hash table | UBIGINT |
12 |
pivot_filter_threshold |
The threshold to switch from using filtered aggregates to LIST with a dedicated pivot operator | UBIGINT |
20 |
pivot_limit |
The maximum number of pivot columns in a pivot statement | UBIGINT |
100000 |
prefer_range_joins |
Force use of range joins with mixed predicates | BOOLEAN |
false |
preserve_identifier_case |
Whether or not to preserve the identifier case, instead of always lowercasing all non-quoted identifiers | BOOLEAN |
true |
profile_output , profiling_output |
The file to which profile output should be saved, or empty to print to the terminal | VARCHAR |
|
profiling_mode |
The profiling mode (STANDARD or DETAILED ) |
VARCHAR |
NULL |
progress_bar_time |
Sets the time (in milliseconds) how long a query needs to take before we start printing a progress bar | BIGINT |
2000 |
scalar_subquery_error_on_multiple_rows |
When a scalar subquery returns multiple rows - return a random row instead of returning an error. | BOOLEAN |
true |
schema |
Sets the default search schema. Equivalent to setting search_path to a single value. | VARCHAR |
main |
search_path |
Sets the default catalog search path as a comma-separated list of values | VARCHAR |
|
streaming_buffer_size |
The maximum memory to buffer between fetching from a streaming result (e.g., 1GB) | VARCHAR |
976.5 KiB |
DuckDB creates several files and directories on disk. This page lists both the global and the local ones.
DuckDB creates the following global files and directories in the user's home directory (denoted with ~
):
Location | Description | Shared between versions | Shared between clients |
---|---|---|---|
~/.duckdbrc |
The content of this file is executed when starting the [DuckDB CLI client]({% link docs/clients/cli/overview.md %}). The commands can be both [dot command]({% link docs/clients/cli/dot_commands.md %}) and SQL statements. The naming of this file follows the ~/.bashrc and ~/.zshrc “run commands” files. |
Yes | Only used by CLI |
~/.duckdb_history |
History file, similar to ~/.bash_history and ~/.zsh_history . Used by the [DuckDB CLI client]({% link docs/clients/cli/overview.md %}). |
Yes | Only used by CLI |
~/.duckdb/extensions |
Binaries of installed [extensions]({% link docs/extensions/overview.md %}). | No | Yes |
~/.duckdb/stored_secrets |
[Persistent secrets]({% link docs/configuration/secrets_manager.md %}#persistent-secrets) created by the [Secrets manager]({% link docs/configuration/secrets_manager.md %}). | Yes | Yes |
DuckDB creates the following files and directories in the working directory (for in-memory connections) or relative to the database file (for persistent connections):
Name | Description | Example |
---|---|---|
⟨database_filename⟩ |
Database file. Only created in on-disk mode. The file can have any extension with typical extensions being .duckdb , .db , and .ddb . |
weather.duckdb |
.tmp/ |
Temporary directory. Only created in in-memory mode. | .tmp/ |
⟨database_filename⟩.tmp/ |
Temporary directory. Only created in on-disk mode. | weather.tmp/ |
⟨database_filename⟩.wal |
Write-ahead log file. If DuckDB exits normally, the WAL file is deleted upon exit. If DuckDB crashes, the WAL file is required to recover data. | weather.wal |
If you are working in a Git repository and would like to disable tracking these files by Git,
see the instructions on using [.gitignore
for DuckDB]({% link docs/operations_manual/footprint_of_duckdb/gitignore_for_duckdb.md %}).
DuckDB uses a single-file format, which has some inherent limitations w.r.t. reclaiming disk space.
To reclaim space after deleting rows, use the [CHECKPOINT
statement]({% link docs/sql/statements/checkpoint.md %}).
The [VACUUM
statement]({% link docs/sql/statements/vacuum.md %}) does not trigger vacuuming deletes and hence does not reclaim space.
To compact the database, you can create a fresh copy of the database using the [COPY FROM DATABASE
statement]({% link docs/sql/statements/copy.md %}#copy-from-database--to). In the following example, we first connect to the original database db1
, then the new (empty) database db2
. Then, we copy the content of db1
to db2
.
ATTACH 'db1.db' AS db1;
ATTACH 'db2.db' AS db2;
COPY FROM DATABASE db1 TO db2;
DuckDB has a powerful extension mechanism, which have the same privileges as the user running DuckDB's (parent) process. This introduces security considerations. Therefore, we recommend reviewing the configuration options listed on this page and setting them according to your attack models.
DuckDB extensions are checked on every load using the signature of the binaries. There are currently three categories of extensions:
- Signed with a
core
key. Only extensions vetted by the core DuckDB team are signed with these keys. - Signed with a
community
key. These are open-source extensions distributed via the [DuckDB Community Extensions repository]({% link community_extensions/index.md %}). - Unsigned.
DuckDB offers the following security levels for extensions.
Usable extensions | Description | Configuration |
---|---|---|
core |
Extensions can only be loaded if signed from a core key. |
SET allow_community_extensions = false |
core and community |
Extensions can only be loaded if signed from a core or community key. |
This is the default security level. |
Any extension including unsigned | Any extensions can be loaded. | SET allow_unsigned_extensions = true |
Security-related configuration settings [lock themselves]({% link docs/operations_manual/securing_duckdb/overview.md %}#locking-configurations), i.e., it is only possible to restrict capabilities in the current process.
For example, attempting the following configuration changes will result in an error:
SET allow_community_extensions = false;
SET allow_community_extensions = true;
Invalid Input Error: Cannot upgrade allow_community_extensions setting while database is running
DuckDB has a [Community Extensions repository]({% link community_extensions/index.md %}), which allows convenient installation of third-party extensions. Community extension repositories like pip or npm are essentially enabling remote code execution by design. This is less dramatic than it sounds. For better or worse, we are quite used to piping random scripts from the web into our shells, and routinely install a staggering amount of transitive dependencies without thinking twice. Some repositories like CRAN enforce a human inspection at some point, but that’s no guarantee for anything either.
We’ve studied several different approaches to community extension repositories and have picked what we think is a sensible approach: we do not attempt to review the submissions, but require that the source code of extensions is available. We do take over the complete build, sign and distribution process. Note that this is a step up from pip and npm that allow uploading arbitrary binaries but a step down from reviewing everything manually. We allow users to report malicious extensions and show adoption statistics like GitHub stars and download count. Because we manage the repository, we can remove problematic extensions from distribution quickly.
Despite this, installing and loading DuckDB extensions from the community extension repository will execute code written by third party developers, and therefore can be dangerous. A malicious developer could create and register a harmless-looking DuckDB extension that steals your crypto coins. If you’re running a web service that executes untrusted SQL from users with DuckDB, it is probably a good idea to disable community extension installation and loading entirely. This can be done like so:
SET allow_community_extensions = false;
By default, DuckDB automatically installs and loads known extensions.
To disable autoinstalling known extensions, run:
SET autoinstall_known_extensions = false;
To disable autoloading known extensions, run:
SET autoload_known_extensions = false;
To lock this configuration, use the [lock_configuration
option]({% link docs/operations_manual/securing_duckdb/overview.md %}#locking-configurations):
SET lock_configuration = true;
By default, DuckDB requires extensions to be either signed as core extensions (created by the DuckDB developers) or community extensions (created by third-party developers but distributed by the DuckDB developers).
The [allow_unsigned_extensions
setting]({% link docs/extensions/overview.md %}#unsigned-extensions) can be enabled on start-up to allow loading unsigned extensions.
While this setting is useful for extension development, enabling it will allow DuckDB to load any extensions, which means more care must be taken to ensure malicious extensions are not loaded.
DuckDB is quite powerful, which can be problematic, especially if untrusted SQL queries are run, e.g., from public-facing user inputs. This page lists some options to restrict the potential fallout from malicious SQL queries.
The approach to securing DuckDB varies depending on your use case, environment, and potential attack models. Therefore, consider the security-related configuration options carefully, especially when working with confidential data sets.
If you plan to embed DuckDB in your application, please consult the [“Embedding DuckDB”]({% link docs/operations_manual/embedding_duckdb.md %}) page.
If you discover a potential vulnerability, please report it confidentially via GitHub.
DuckDB's CLI client supports [“safe mode”]({% link docs/clients/cli/safe_mode.md %}), which prevents DuckDB from accessing external files other than the database file. This can be activated via a command line argument or a [dot command]({% link docs/clients/cli/dot_commands.md %}):
duckdb -safe ...
.safe_mode
DuckDB can list directories and read arbitrary files via its CSV parser’s [read_csv
function]({% link docs/data/csv/overview.md %}) or read text via the [read_text
function]({% link docs/sql/functions/char.md %}#read_textsource). For example:
SELECT *
FROM read_csv('/etc/passwd', sep = ':');
This can be disabled either by disabling external access altogether (enable_external_access
) or disabling individual file systems. For example:
SET disabled_filesystems = 'LocalFileSystem';
[Secrets]({% link docs/configuration/secrets_manager.md %}) are used to manage credentials to log into third party services like AWS or Azure. DuckDB can show a list of secrets using the duckdb_secrets()
table function. This will redact any sensitive information such as security keys by default. The allow_unredacted_secrets
option can be set to show all information contained within a security key. It is recommended not to turn on this option if you are running untrusted SQL input.
Queries can access the secrets defined in the Secrets Manager. For example, if there is a secret defined to authenticate with a user, who has write privileges to a given AWS S3 bucket, queries may write to that bucket. This is applicable for both persistent and temporary secrets.
[Persistent secrets]({% link docs/configuration/secrets_manager.md %}#persistent-secrets) are stored in unencrypted binary format on the disk. These have the same permissions as SSH keys, 600
, i.e., only user who is running the DuckDB (parent) process can read and write them.
Security-related configuration settings generally lock themselves for safety reasons. For example, while we can disable [Community Extensions]({% link community_extensions/index.md %}) using the SET allow_community_extensions = false
, we cannot re-enable them again after the fact without restarting the database. Trying to do so will result in an error:
Invalid Input Error: Cannot upgrade allow_community_extensions setting while database is running
This prevents untrusted SQL input from re-enabling settings that were explicitly disabled for security reasons.
Nevertheless, many configuration settings do not disable themselves, such as the resource constraints. If you allow users to run SQL statements unrestricted on your own hardware, it is recommended that you lock the configuration after your own configuration has finished using the following command:
SET lock_configuration = true;
This prevents any configuration settings from being modified from that point onwards.
DuckDB can use quite a lot of CPU, RAM, and disk space. To avoid denial of service attacks, these resources can be limited.
The number of CPU threads that DuckDB can use can be set using, for example:
SET threads = 4;
Where 4 is the number of allowed threads.
The maximum amount of memory (RAM) can also be limited, for example:
SET memory_limit = '4GB';
The size of the temporary file directory can be limited with:
SET max_temp_directory_size = '4GB';
DuckDB has a powerful extension mechanism, which have the same privileges as the user running DuckDB's (parent) process. This introduces security considerations. Therefore, we recommend reviewing the configuration options for [securing extensions]({% link docs/operations_manual/securing_duckdb/securing_extensions.md %}).
Avoid running DuckDB as a root user (e.g., using sudo
).
There is no good reason to run DuckDB as root.
Securing DuckDB can also be supported via proven means, for example:
- Scoping user privileges via
chroot
, relying on the operating system - Containerization, e.g., Docker and Podman
- Running DuckDB in WebAssembly
layout: docu title: TPC-DS Extension github_directory: https://github.com/duckdb/duckdb/tree/main/extension/tpcds
The tpcds
extension implements the data generator and queries for the TPC-DS benchmark.
The tpcds
extension will be transparently [autoloaded]({% link docs/extensions/overview.md %}#autoloading-extensions) on first use from the official extension repository.
If you would like to install and load it manually, run:
INSTALL tpcds;
LOAD tpcds;
To generate data for scale factor 1, use:
CALL dsdgen(sf = 1);
To run a query, e.g., query 8, use:
PRAGMA tpcds(8);
s_store_name | sum(ss_net_profit) |
---|---|
able | -10354620.18 |
ation | -10576395.52 |
bar | -10625236.01 |
ese | -10076698.16 |
ought | -10994052.78 |
It's possible to generate the schema of TPC-DS without any data by setting the scale factor to 0:
CALL dsdgen(sf = 0);
The tpchds(⟨query_id⟩)
function runs a fixed TPC-DS query with pre-defined bind parameters (a.k.a. substitution parameters).
It is not possible to change the query parameters using the tpcds
extension.
layout: docu title: Delta Extension github_repository: https://github.com/duckdb/duckdb-delta
The delta
extension adds support for the Delta Lake open-source storage format. It is built using the Delta Kernel. The extension offers read support for Delta tables, both local and remote.
For implementation details, see the [announcement blog post]({% post_url 2024-06-10-delta %}).
Warning The
delta
extension is currently experimental and is only supported on given platforms.
The delta
extension will be transparently [autoloaded]({% link docs/extensions/overview.md %}#autoloading-extensions) on first use from the official extension repository.
If you would like to install and load it manually, run:
INSTALL delta;
LOAD delta;
To scan a local Delta table, run:
SELECT *
FROM delta_scan('file:///some/path/on/local/machine');
To scan a Delta table in an [S3 bucket]({% link docs/extensions/httpfs/s3api.md %}), run:
SELECT *
FROM delta_scan('s3://some/delta/table');
For authenticating to S3 buckets, DuckDB [Secrets]({% link docs/configuration/secrets_manager.md %}) are supported:
CREATE SECRET (
TYPE S3,
PROVIDER CREDENTIAL_CHAIN
);
SELECT *
FROM delta_scan('s3://some/delta/table/with/auth');
To scan public buckets on S3, you may need to pass the correct region by creating a secret containing the region of your public S3 bucket:
CREATE SECRET (
TYPE S3,
REGION 'my-region'
);
SELECT *
FROM delta_scan('s3://some/public/table/in/my-region');
To scan a Delta table in an [Azure Blob Storage bucket]({% link docs/extensions/azure.md %}#azure-blob-storage), run:
SELECT *
FROM delta_scan('az://my-container/my-table');
For authenticating to Azure Blob Storage, DuckDB [Secrets]({% link docs/configuration/secrets_manager.md %}) are supported:
CREATE SECRET (
TYPE AZURE,
PROVIDER CREDENTIAL_CHAIN
);
SELECT *
FROM delta_scan('az://my-container/my-table-with-auth');
While the delta
extension is still experimental, many (scanning) features and optimizations are already supported:
- multithreaded scans and Parquet metadata reading
- data skipping/filter pushdown
- skipping row groups in file (based on Parquet metadata)
- skipping complete files (based on Delta partition information)
- projection pushdown
- scanning tables with deletion vectors
- all primitive types
- structs
- S3 support with secrets
More optimizations are going to be released in the future.
The delta
extension requires DuckDB version 0.10.3 or newer.
The delta
extension currently only supports the following platforms:
- Linux AMD64 (x86_64 and ARM64):
linux_amd64
,linux_amd64_gcc4
, andlinux_arm64
- macOS Intel and Apple Silicon:
osx_amd64
andosx_arm64
- Windows AMD64:
windows_amd64
Support for the [other DuckDB platforms]({% link docs/extensions/working_with_extensions.md %}#platforms) is work-in-progress.
layout: docu title: Iceberg Extension github_repository: https://github.com/duckdb/duckdb-iceberg
The iceberg
extension is a loadable extension that implements support for the Apache Iceberg format.
To install and load the iceberg
extension, run:
INSTALL iceberg;
LOAD iceberg;
The iceberg
extension often receives updates between DuckDB releases. To make sure that you have the latest version, run:
UPDATE EXTENSIONS (iceberg);
To test the examples, download the iceberg_data.zip
file and unzip it.
Parameter | Type | Default | Description |
---|---|---|---|
allow_moved_paths |
BOOLEAN |
false |
Allows scanning Iceberg tables that are moved |
metadata_compression_codec |
VARCHAR |
'' |
Treats metadata files as when set to 'gzip' |
version |
VARCHAR |
'?' |
Provides an explicit version string, hint file or guessing |
version_name_format |
VARCHAR |
'v%s%s.metadata.json,%s%s.metadata.json' |
Controls how versions are converted to metadata file names |
SELECT count(*)
FROM iceberg_scan('data/iceberg/lineitem_iceberg', allow_moved_paths = true);
count_star() |
---|
51793 |
The
allow_moved_paths
option ensures that some path resolution is performed, which allows scanning Iceberg tables that are moved.
You can also address specify the current manifest directly in the query, this may be resolved from the catalog prior to the query, in this example the manifest version is a UUID.
To do so, navigate to the data/iceberg
directory and run:
SELECT count(*)
FROM iceberg_scan('lineitem_iceberg/metadata/v1.metadata.json');
count_star() |
---|
60175 |
The iceberg
extension can be paired with the [httpfs
extension]({% link docs/extensions/httpfs/overview.md %}) to access Iceberg tables in object stores such as S3.
SELECT count(*)
FROM iceberg_scan(
's3://bucketname/lineitem_iceberg/metadata/v1.metadata.json',
allow_moved_paths = true
);
SELECT *
FROM iceberg_metadata('data/iceberg/lineitem_iceberg', allow_moved_paths = true);
manifest_path | manifest_sequence_number | manifest_content | status | content | file_path | file_format | record_count |
---|---|---|---|---|---|---|---|
lineitem_iceberg/metadata/10eaca8a-1e1c-421e-ad6d-b232e5ee23d3-m1.avro | 2 | DATA | ADDED | EXISTING | lineitem_iceberg/data/00041-414-f3c73457-bbd6-4b92-9c15-17b241171b16-00001.parquet | PARQUET | 51793 |
lineitem_iceberg/metadata/10eaca8a-1e1c-421e-ad6d-b232e5ee23d3-m0.avro | 2 | DATA | DELETED | EXISTING | lineitem_iceberg/data/00000-411-0792dcfe-4e25-4ca3-8ada-175286069a47-00001.parquet | PARQUET | 60175 |
SELECT *
FROM iceberg_snapshots('data/iceberg/lineitem_iceberg');
sequence_number | snapshot_id | timestamp_ms | manifest_list |
---|---|---|---|
1 | 3776207205136740581 | 2023-02-15 15:07:54.504 | lineitem_iceberg/metadata/snap-3776207205136740581-1-cf3d0be5-cf70-453d-ad8f-48fdc412e608.avro |
2 | 7635660646343998149 | 2023-02-15 15:08:14.73 | lineitem_iceberg/metadata/snap-7635660646343998149-1-10eaca8a-1e1c-421e-ad6d-b232e5ee23d3.avro |
By default, the iceberg
extension will look for a version-hint.text
file to identify the proper metadata version to use. This can be overridden by explicitly supplying a version number via the version
parameter to iceberg table functions. By default, this will look for both v{version}.metadata.json
and {version}.metadata.json
files, or v{version}.gz.metadata.json
and {version}.gz.metadata.json
when metadata_compression_codec = 'gzip'
is specified. Other compression codecs are not supported.
Additionally, if any .text
or .txt
file is provided as a version, it is opened and treated as a version-hint file. The iceberg
extension will open this file and use the entire contents of the file as a provided version number.
The entire contents of the
version-hint.txt
file will be treated as a literal version name, with no encoding, escaping or trimming. This includes any whitespace, or unsafe characters which will be explicitly passed formatted into filenames in the logic described below.
SELECT *
FROM iceberg_snapshots(
'data/iceberg/lineitem_iceberg',
version = '1',
allow_moved_paths = true
);
count_star() |
---|
60175 |
The iceberg
extension can handle different metadata naming conventions by specifying them as a comma-delimited list of format strings via the version_name_format
parameter. Each format string must take two %s
parameters. The first is the location of the version number in the metadata filename and the second is the location of the metadata_compression_codec
extension. The behavior described above is provided by the default value of "v%s%s.metadata.gz,%s%smetadata.gz
. In the event you had an alternatively named metadata file with such as rev-2.metadata.json.gz
, the table could be read via the follow statement.
SELECT *
FROM iceberg_snapshots(
'data/iceberg/alternative_metadata_gz_naming',
version = '2',
version_name_format = 'rev-%s.metadata.json%s',
metadata_compression_codec = 'gzip',
allow_moved_paths = true
);
count_star() |
---|
60175 |
By default, either a table version number or a version-hint.text
must be provided for the iceberg
extension to read a table. This is typically provided by an external data catalog. In the event neither is present, the iceberg
extension can attempt to guess the latest version by passing ?
as the table version. The “latest” version is assumed to be the filename that is lexicographically largest when sorting the filenames. Collations are not considered. This behavior is not enabled by default as it may potentially violate ACID constraints. It can be enabled by setting unsafe_enable_version_guessing
to true
. When this is set, iceberg
functions will attempt to guess the latest version by default before failing.
SET unsafe_enable_version_guessing=true;
SELECT count(*)
FROM iceberg_scan('data/iceberg/lineitem_iceberg_no_hint', allow_moved_paths = true);
-- Or explicitly as:
-- FROM iceberg_scan(
-- 'data/iceberg/lineitem_iceberg_no_hint',
-- version = '?',
-- allow_moved_paths = true
-- );
count_star() |
---|
51793 |
layout: docu title: PostgreSQL Extension github_repository: https://github.com/duckdb/duckdb-postgres redirect_from:
- /docs/extensions/postgres_scanner
- /docs/extensions/postgres_scanner/
- /docs/extensions/postgresql
- /docs/extensions/postgresql/
The postgres
extension allows DuckDB to directly read and write data from a running PostgreSQL database instance. The data can be queried directly from the underlying PostgreSQL database. Data can be loaded from PostgreSQL tables into DuckDB tables, or vice versa. See the [official announcement]({% post_url 2022-09-30-postgres-scanner %}) for implementation details and background.
The postgres
extension will be transparently [autoloaded]({% link docs/extensions/overview.md %}#autoloading-extensions) on first use from the official extension repository.
If you would like to install and load it manually, run:
INSTALL postgres;
LOAD postgres;
To make a PostgreSQL database accessible to DuckDB, use the ATTACH
command with the POSTGRES
or POSTGRES_SCANNER
type.
To connect to the public
schema of the PostgreSQL instance running on localhost in read-write mode, run:
ATTACH '' AS postgres_db (TYPE POSTGRES);
To connect to the PostgreSQL instance with the given parameters in read-only mode, run:
ATTACH 'dbname=postgres user=postgres host=127.0.0.1' AS db (TYPE POSTGRES, READ_ONLY);
By default, all schemas are attached. When working with large instances, it can be useful to only attach a specific schema. This can be accomplished using the SCHEMA
command.
ATTACH 'dbname=postgres user=postgres host=127.0.0.1' AS db (TYPE POSTGRES, SCHEMA 'public');
The ATTACH
command takes as input either a libpq
connection string
or a PostgreSQL URI.
Below are some example connection strings and commonly used parameters. A full list of available parameters can be found in the PostgreSQL documentation.
dbname=postgresscanner
host=localhost port=5432 dbname=mydb connect_timeout=10
Name | Description | Default |
---|---|---|
dbname |
Database name | [user] |
host |
Name of host to connect to | localhost |
hostaddr |
Host IP address | localhost |
passfile |
Name of file passwords are stored in | ~/.pgpass |
password |
PostgreSQL password | (empty) |
port |
Port number | 5432 |
user |
PostgreSQL user name | current user |
An example URI is postgresql://username@hostname/dbname
.
PostgreSQL connection information can also be specified with secrets. The following syntax can be used to create a secret.
CREATE SECRET (
TYPE POSTGRES,
HOST '127.0.0.1',
PORT 5432,
DATABASE postgres,
USER 'postgres',
PASSWORD ''
);
The information from the secret will be used when ATTACH
is called. We can leave the PostgreSQL connection string empty to use all of the information stored in the secret.
ATTACH '' AS postgres_db (TYPE POSTGRES);
We can use the PostgreSQL connection string to override individual options. For example, to connect to a different database while still using the same credentials, we can override only the database name in the following manner.
ATTACH 'dbname=my_other_db' AS postgres_db (TYPE POSTGRES);
By default, created secrets are temporary. Secrets can be persisted using the [CREATE PERSISTENT SECRET
command]({% link docs/configuration/secrets_manager.md %}#persistent-secrets). Persistent secrets can be used across sessions.
Named secrets can be used to manage connections to multiple PostgreSQL database instances. Secrets can be given a name upon creation.
CREATE SECRET postgres_secret_one (
TYPE POSTGRES,
HOST '127.0.0.1',
PORT 5432,
DATABASE postgres,
USER 'postgres',
PASSWORD ''
);
The secret can then be explicitly referenced using the SECRET
parameter in the ATTACH
.
ATTACH '' AS postgres_db_one (TYPE POSTGRES, SECRET postgres_secret_one);
PostgreSQL connection information can also be specified with environment variables. This can be useful in a production environment where the connection information is managed externally and passed in to the environment.
export PGPASSWORD="secret"
export PGHOST=localhost
export PGUSER=owner
export PGDATABASE=mydatabase
Then, to connect, start the duckdb
process and run:
ATTACH '' AS p (TYPE POSTGRES);
The tables in the PostgreSQL database can be read as if they were normal DuckDB tables, but the underlying data is read directly from PostgreSQL at query time.
SHOW ALL TABLES;
name |
---|
uuids |
SELECT * FROM uuids;
u |
---|
6d3d2541-710b-4bde-b3af-4711738636bf |
NULL |
00000000-0000-0000-0000-000000000001 |
ffffffff-ffff-ffff-ffff-ffffffffffff |
It might be desirable to create a copy of the PostgreSQL databases in DuckDB to prevent the system from re-reading the tables from PostgreSQL continuously, particularly for large tables.
Data can be copied over from PostgreSQL to DuckDB using standard SQL, for example:
CREATE TABLE duckdb_table AS FROM postgres_db.postgres_tbl;
In addition to reading data from PostgreSQL, the extension allows you to create tables, ingest data into PostgreSQL and make other modifications to a PostgreSQL database using standard SQL queries.
This allows you to use DuckDB to, for example, export data that is stored in a PostgreSQL database to Parquet, or read data from a Parquet file into PostgreSQL.
Below is a brief example of how to create a new table in PostgreSQL and load data into it.
ATTACH 'dbname=postgresscanner' AS postgres_db (TYPE POSTGRES);
CREATE TABLE postgres_db.tbl (id INTEGER, name VARCHAR);
INSERT INTO postgres_db.tbl VALUES (42, 'DuckDB');
Many operations on PostgreSQL tables are supported. All these operations directly modify the PostgreSQL database, and the result of subsequent operations can then be read using PostgreSQL.
Note that if modifications are not desired, ATTACH
can be run with the READ_ONLY
property which prevents making modifications to the underlying database. For example:
ATTACH 'dbname=postgresscanner' AS postgres_db (TYPE POSTGRES, READ_ONLY);
Below is a list of supported operations.
CREATE TABLE postgres_db.tbl (id INTEGER, name VARCHAR);
INSERT INTO postgres_db.tbl VALUES (42, 'DuckDB');
SELECT * FROM postgres_db.tbl;
id | name |
---|---|
42 | DuckDB |
You can copy tables back and forth between PostgreSQL and DuckDB:
COPY postgres_db.tbl TO 'data.parquet';
COPY postgres_db.tbl FROM 'data.parquet';
These copies use PostgreSQL binary wire encoding. DuckDB can also write data using this encoding to a file which you can then load into PostgreSQL using a client of your choosing if you would like to do your own connection management:
COPY 'data.parquet' TO 'pg.bin' WITH (FORMAT POSTGRES_BINARY);
The file produced will be the equivalent of copying the file to PostgreSQL using DuckDB and then dumping it from PostgreSQL using psql
or another client:
DuckDB:
COPY postgres_db.tbl FROM 'data.parquet';
PostgreSQL:
\copy tbl TO 'data.bin' WITH (FORMAT BINARY);
You may also create a full copy of the database using the [COPY FROM DATABASE
statement]({% link docs/sql/statements/copy.md %}#copy-from-database--to):
COPY FROM DATABASE postgres_db TO my_duckdb_db;
UPDATE postgres_db.tbl
SET name = 'Woohoo'
WHERE id = 42;
DELETE FROM postgres_db.tbl
WHERE id = 42;
ALTER TABLE postgres_db.tbl
ADD COLUMN k INTEGER;
DROP TABLE postgres_db.tbl;
CREATE VIEW postgres_db.v1 AS SELECT 42;
CREATE SCHEMA postgres_db.s1;
CREATE TABLE postgres_db.s1.integers (i INTEGER);
INSERT INTO postgres_db.s1.integers VALUES (42);
SELECT * FROM postgres_db.s1.integers;
i |
---|
42 |
DROP SCHEMA postgres_db.s1;
DETACH postgres_db;
CREATE TABLE postgres_db.tmp (i INTEGER);
BEGIN;
INSERT INTO postgres_db.tmp VALUES (42);
SELECT * FROM postgres_db.tmp;
This returns:
i |
---|
42 |
ROLLBACK;
SELECT * FROM postgres_db.tmp;
This returns an empty table.
The postgres_query
table function allows you to run arbitrary read queries within an attached database. postgres_query
takes the name of the attached PostgreSQL database to execute the query in, as well as the SQL query to execute. The result of the query is returned. Single-quote strings are escaped by repeating the single quote twice.
postgres_query(attached_database::VARCHAR, query::VARCHAR)
For example:
ATTACH 'dbname=postgresscanner' AS postgres_db (TYPE POSTGRES);
SELECT * FROM postgres_query('postgres_db', 'SELECT * FROM cars LIMIT 3');
brand | model | color |
---|---|---|
Ferrari | Testarossa | red |
Aston Martin | DB2 | blue |
Bentley | Mulsanne | gray |
The postgres_execute
function allows running arbitrary queries within PostgreSQL, including statements that update the schema and content of the database.
ATTACH 'dbname=postgresscanner' AS postgres_db (TYPE POSTGRES);
CALL postgres_execute('postgres_db', 'CREATE TABLE my_table (i INTEGER)');
The extension exposes the following configuration parameters.
Name | Description | Default |
---|---|---|
pg_array_as_varchar |
Read PostgreSQL arrays as varchar - enables reading mixed dimensional arrays | false |
pg_connection_cache |
Whether or not to use the connection cache | true |
pg_connection_limit |
The maximum amount of concurrent PostgreSQL connections | 64 |
pg_debug_show_queries |
DEBUG SETTING: print all queries sent to PostgreSQL to stdout | false |
pg_experimental_filter_pushdown |
Whether or not to use filter pushdown (currently experimental) | false |
pg_pages_per_task |
The amount of pages per task | 1000 |
pg_use_binary_copy |
Whether or not to use BINARY copy to read data | true |
pg_null_byte_replacement |
When writing NULL bytes to Postgres, replace them with the given character | NULL |
pg_use_ctid_scan |
Whether or not to parallelize scanning using table ctids | true |
To avoid having to continuously fetch schema data from PostgreSQL, DuckDB keeps schema information – such as the names of tables, their columns, etc. – cached. If changes are made to the schema through a different connection to the PostgreSQL instance, such as new columns being added to a table, the cached schema information might be outdated. In this case, the function pg_clear_cache
can be executed to clear the internal caches.
CALL pg_clear_cache();
Deprecated The old
postgres_attach
function is deprecated. It is recommended to switch over to the newATTACH
syntax.
layout: docu title: Extensions redirect_from:
- /docs/extensions
- /docs/extensions/
DuckDB has a flexible extension mechanism that allows for dynamically loading extensions. These may extend DuckDB's functionality by providing support for additional file formats, introducing new types, and domain-specific functionality.
Extensions are loadable on all clients (e.g., Python and R). Extensions distributed via the Core and Community repositories are built and tested on macOS, Windows and Linux. All operating systems are supported for both the AMD64 and the ARM64 architectures.
To get a list of extensions, use duckdb_extensions
:
SELECT extension_name, installed, description
FROM duckdb_extensions();
extension_name | installed | description |
---|---|---|
arrow | false | A zero-copy data integration between Apache Arrow and DuckDB |
autocomplete | false | Adds support for autocomplete in the shell |
... | ... | ... |
This list will show which extensions are available, which extensions are installed, at which version, where it is installed, and more. The list includes most, but not all, available core extensions. For the full list, we maintain a [list of core extensions]({% link docs/extensions/core_extensions.md %}).
DuckDB's binary distribution comes standard with a few built-in extensions. They are statically linked into the binary and can be used as is.
For example, to use the built-in [json
extension]({% link docs/data/json/overview.md %}) to read a JSON file:
SELECT *
FROM 'test.json';
To make the DuckDB distribution lightweight, only a few essential extensions are built-in, varying slightly per distribution. Which extension is built-in on which platform is documented in the [list of core extensions]({% link docs/extensions/core_extensions.md %}#default-extensions).
To make an extension that is not built-in available in DuckDB, two steps need to happen:
-
Extension installation is the process of downloading the extension binary and verifying its metadata. During installation, DuckDB stores the downloaded extension and some metadata in a local directory. From this directory DuckDB can then load the Extension whenever it needs to. This means that installation needs to happen only once.
-
Extension loading is the process of dynamically loading the binary into a DuckDB instance. DuckDB will search the local extension directory for the installed extension, then load it to make its features available. This means that every time DuckDB is restarted, all extensions that are used need to be (re)loaded
Extension installation and loading are subject to a few [limitations]({% link docs/extensions/working_with_extensions.md %}#limitations).
There are two main methods of making DuckDB perform the installation and loading steps for an installable extension: explicitly and through autoloading.
In DuckDB extensions can also be explicitly installed and loaded. Both non-autoloadable and autoloadable extensions can be installed this way.
To explicitly install and load an extension, DuckDB has the dedicated SQL statements LOAD
and INSTALL
. For example,
to install and load the [spatial
extension]({% link docs/extensions/spatial/overview.md %}), run:
INSTALL spatial;
LOAD spatial;
With these statements, DuckDB will ensure the spatial extension is installed (ignoring the INSTALL
statement if it is already installed), then proceed
to LOAD
the spatial extension (again ignoring the statement if it is already loaded).
Optionally a repository can be provided where the extension should be installed from, by appending FROM ⟨repository⟩
to the INSTALL
/ FORCE INSTALL
command.
This repository can either be an alias, such as [community
]({% link community_extensions/index.md %}), or it can be a direct URL, provided as a single-quoted string.
After installing/loading an extension, the duckdb_extensions
function can be used to get more information.
For many of DuckDB's core extensions, explicitly loading and installing extensions is not necessary. DuckDB contains an autoloading mechanism which can install and load the core extensions as soon as they are used in a query. For example, when running:
SELECT *
FROM 'https://raw.githubusercontent.com/duckdb/duckdb-web/main/data/weather.csv';
DuckDB will automatically install and load the [httpfs
]({% link docs/extensions/httpfs/overview.md %}) extension. No explicit INSTALL
or LOAD
statements are required.
Not all extensions can be autoloaded. This can have various reasons: some extensions make several changes to the running DuckDB instance, making autoloading technically not (yet) possible. For others, it is preferred to have users opt-in to the extension explicitly before use due to the way they modify behavior in DuckDB.
To see which extensions can be autoloaded, check the [core extensions list]({% link docs/extensions/core_extensions.md %}).
DuckDB supports installing third-party [Community Extensions]({% link community_extensions/index.md %}). These are contributed by community members but they are built, signed, and distributed in a centralized repository.
For many clients, using SQL to load and install extensions is the preferred method. However, some clients have a dedicated
API to install and load extensions. For example the [Python API client]({% link docs/clients/python/overview.md %}#loading-and-installing-extensions),
which has dedicated install_extension(name: str)
and load_extension(name: str)
methods. For more details on a specific Client API, refer
to the [Client API docs]({% link docs/clients/overview.md %})
While built-in extensions are tied to a DuckDB release due to their nature of being built into the DuckDB binary, installable extensions can and do receive updates. To ensure all currently installed extensions are on the most recent version, call:
UPDATE EXTENSIONS;
For more details on extension version refer to [Extension Versioning]({% link docs/extensions/versioning_of_extensions.md %}).
By default, extensions are installed under the user's home directory:
~/.duckdb/extensions/⟨duckdb_version⟩/⟨platform_name⟩/
For stable DuckDB releases, the ⟨duckdb_version⟩
will be equal to the version tag of that release. For nightly DuckDB builds, it will be equal
to the short git hash of the build. So for example, the extensions for DuckDB version v0.10.3 on macOS ARM64 (Apple Silicon) are installed to ~/.duckdb/extensions/v0.10.3/osx_arm64/
.
An example installation path for a nightly DuckDB build could be ~/.duckdb/extensions/fc2e4b26a6/linux_amd64_gcc4
.
To change the default location where DuckDB stores its extensions, use the extension_directory
configuration option:
SET extension_directory = '/path/to/your/extension/directory';
Note that setting the value of the home_directory
configuration option has no effect on the location of the extensions.
To avoid binary compatibility issues, the binary extensions distributed by DuckDB are tied both to a specific DuckDB version and a platform. This means that DuckDB can automatically detect binary compatibility between it and a loadable extension. When trying to load an extension that was compiled for a different version or platform, DuckDB will throw an error and refuse to load the extension.
See the [Working with Extensions page]({% link docs/extensions/working_with_extensions.md %}#platforms) for details on available platforms.
The same API that the core extensions use is available for developing extensions. This allows users to extend the functionality of DuckDB such that it suits their domain the best.
A template for creating extensions is available in the extension-template
repository. This template also holds some documentation on how to get started building your own extension.
Extensions are signed with a cryptographic key, which also simplifies distribution (this is why they are served over HTTP and not HTTPS). By default, DuckDB uses its built-in public keys to verify the integrity of extension before loading them. All extensions provided by the DuckDB core team are signed.
Warning Only load unsigned extensions from sources you trust. Avoid loading unsigned extensions over HTTP. Consult the [Securing DuckDB page]({% link docs/operations_manual/securing_duckdb/securing_extensions.md %}) for guidelines on how set up DuckDB in a secure manner.
If you wish to load your own extensions or extensions from third-parties you will need to enable the allow_unsigned_extensions
flag.
To load unsigned extensions using the [CLI client]({% link docs/clients/cli/overview.md %}), pass the -unsigned
flag to it on startup:
duckdb -unsigned
Now any extension can be loaded, signed or not:
LOAD './some/local/ext.duckdb_extension';
For client APIs, the allow_unsigned_extensions
database configuration options needs to be set, see the respective [Client API docs]({% link docs/clients/overview.md %}).
For example, for the Python client, see the [Loading and Installing Extensions section in the Python API documentation]({% link docs/clients/python/overview.md %}#loading-and-installing-extensions).
For advanced installation instructions and more details on extensions, see the [Working with Extensions page]({% link docs/extensions/working_with_extensions.md %}).
layout: docu title: MySQL Extension github_repository: https://github.com/duckdb/duckdb-mysql
The mysql
extension allows DuckDB to directly read and write data from/to a running MySQL instance. The data can be queried directly from the underlying MySQL database. Data can be loaded from MySQL tables into DuckDB tables, or vice versa.
To install the mysql
extension, run:
INSTALL mysql;
The extension is loaded automatically upon first use. If you prefer to load it manually, run:
LOAD mysql;
To make a MySQL database accessible to DuckDB use the ATTACH
command with the MYSQL
or the MYSQL_SCANNER
type:
ATTACH 'host=localhost user=root port=0 database=mysql' AS mysqldb (TYPE MYSQL);
USE mysqldb;
The connection string determines the parameters for how to connect to MySQL as a set of key=value
pairs. Any options not provided are replaced by their default values, as per the table below. Connection information can also be specified with environment variables. If no option is provided explicitly, the MySQL extension tries to read it from an environment variable.
Setting | Default | Environment variable |
---|---|---|
database | NULL | MYSQL_DATABASE |
host | localhost | MYSQL_HOST |
password | MYSQL_PWD | |
port | 0 | MYSQL_TCP_PORT |
socket | NULL | MYSQL_UNIX_PORT |
user | ⟨current user⟩ | MYSQL_USER |
ssl_mode | preferred | |
ssl_ca | ||
ssl_capath | ||
ssl_cert | ||
ssl_cipher | ||
ssl_crl | ||
ssl_crlpath | ||
ssl_key |
MySQL connection information can also be specified with secrets. The following syntax can be used to create a secret.
CREATE SECRET (
TYPE MYSQL,
HOST '127.0.0.1',
PORT 0,
DATABASE mysql,
USER 'mysql',
PASSWORD ''
);
The information from the secret will be used when ATTACH
is called. We can leave the connection string empty to use all of the information stored in the secret.
ATTACH '' AS mysql_db (TYPE MYSQL);
We can use the connection string to override individual options. For example, to connect to a different database while still using the same credentials, we can override only the database name in the following manner.
ATTACH 'database=my_other_db' AS mysql_db (TYPE MYSQL);
By default, created secrets are temporary. Secrets can be persisted using the [CREATE PERSISTENT SECRET
command]({% link docs/configuration/secrets_manager.md %}#persistent-secrets). Persistent secrets can be used across sessions.
Named secrets can be used to manage connections to multiple MySQL database instances. Secrets can be given a name upon creation.
CREATE SECRET mysql_secret_one (
TYPE MYSQL,
HOST '127.0.0.1',
PORT 0,
DATABASE mysql,
USER 'mysql',
PASSWORD ''
);
The secret can then be explicitly referenced using the SECRET
parameter in the ATTACH
.
ATTACH '' AS mysql_db_one (TYPE MYSQL, SECRET mysql_secret_one);
The ssl
connection parameters can be used to make SSL connections. Below is a description of the supported parameters.
Setting | Description |
---|---|
ssl_mode | The security state to use for the connection to the server: disabled, required, verify_ca, verify_identity or preferred (default: preferred ) |
ssl_ca | The path name of the Certificate Authority (CA) certificate file |
ssl_capath | The path name of the directory that contains trusted SSL CA certificate files |
ssl_cert | The path name of the client public key certificate file |
ssl_cipher | The list of permissible ciphers for SSL encryption |
ssl_crl | The path name of the file containing certificate revocation lists |
ssl_crlpath | The path name of the directory that contains files containing certificate revocation lists |
ssl_key | The path name of the client private key file |
The tables in the MySQL database can be read as if they were normal DuckDB tables, but the underlying data is read directly from MySQL at query time.
SHOW ALL TABLES;
name |
---|
signed_integers |
SELECT * FROM signed_integers;
t | s | m | i | b |
---|---|---|---|---|
-128 | -32768 | -8388608 | -2147483648 | -9223372036854775808 |
127 | 32767 | 8388607 | 2147483647 | 9223372036854775807 |
NULL | NULL | NULL | NULL | NULL |
It might be desirable to create a copy of the MySQL databases in DuckDB to prevent the system from re-reading the tables from MySQL continuously, particularly for large tables.
Data can be copied over from MySQL to DuckDB using standard SQL, for example:
CREATE TABLE duckdb_table AS FROM mysqlscanner.mysql_table;
In addition to reading data from MySQL, create tables, ingest data into MySQL and make other modifications to a MySQL database using standard SQL queries.
This allows you to use DuckDB to, for example, export data that is stored in a MySQL database to Parquet, or read data from a Parquet file into MySQL.
Below is a brief example of how to create a new table in MySQL and load data into it.
ATTACH 'host=localhost user=root port=0 database=mysqlscanner' AS mysql_db (TYPE MYSQL);
CREATE TABLE mysql_db.tbl (id INTEGER, name VARCHAR);
INSERT INTO mysql_db.tbl VALUES (42, 'DuckDB');
Many operations on MySQL tables are supported. All these operations directly modify the MySQL database, and the result of subsequent operations can then be read using MySQL.
Note that if modifications are not desired, ATTACH
can be run with the READ_ONLY
property which prevents making modifications to the underlying database. For example:
ATTACH 'host=localhost user=root port=0 database=mysqlscanner' AS mysql_db (TYPE MYSQL, READ_ONLY);
Below is a list of supported operations.
CREATE TABLE mysql_db.tbl (id INTEGER, name VARCHAR);
INSERT INTO mysql_db.tbl VALUES (42, 'DuckDB');
SELECT * FROM mysql_db.tbl;
id | name |
---|---|
42 | DuckDB |
COPY mysql_db.tbl TO 'data.parquet';
COPY mysql_db.tbl FROM 'data.parquet';
You may also create a full copy of the database using the [COPY FROM DATABASE
statement]({% link docs/sql/statements/copy.md %}#copy-from-database--to):
COPY FROM DATABASE mysql_db TO my_duckdb_db;
UPDATE mysql_db.tbl
SET name = 'Woohoo'
WHERE id = 42;
DELETE FROM mysql_db.tbl
WHERE id = 42;
ALTER TABLE mysql_db.tbl
ADD COLUMN k INTEGER;
DROP TABLE mysql_db.tbl;
CREATE VIEW mysql_db.v1 AS SELECT 42;
CREATE SCHEMA mysql_db.s1;
CREATE TABLE mysql_db.s1.integers (i INTEGER);
INSERT INTO mysql_db.s1.integers VALUES (42);
SELECT * FROM mysql_db.s1.integers;
i |
---|
42 |
DROP SCHEMA mysql_db.s1;
CREATE TABLE mysql_db.tmp (i INTEGER);
BEGIN;
INSERT INTO mysql_db.tmp VALUES (42);
SELECT * FROM mysql_db.tmp;
This returns:
i |
---|
42 |
ROLLBACK;
SELECT * FROM mysql_db.tmp;
This returns an empty table.
The DDL statements are not transactional in MySQL.
The mysql_query
table function allows you to run arbitrary read queries within an attached database. mysql_query
takes the name of the attached MySQL database to execute the query in, as well as the SQL query to execute. The result of the query is returned. Single-quote strings are escaped by repeating the single quote twice.
mysql_query(attached_database::VARCHAR, query::VARCHAR)
For example:
ATTACH 'host=localhost database=mysql' AS mysqldb (TYPE MYSQL);
SELECT * FROM mysql_query('mysqldb', 'SELECT * FROM cars LIMIT 3');
The mysql_execute
function allows running arbitrary queries within MySQL, including statements that update the schema and content of the database.
ATTACH 'host=localhost database=mysql' AS mysqldb (TYPE MYSQL);
CALL mysql_execute('mysqldb', 'CREATE TABLE my_table (i INTEGER)');
Name | Description | Default |
---|---|---|
mysql_bit1_as_boolean |
Whether or not to convert BIT(1) columns to BOOLEAN |
true |
mysql_debug_show_queries |
DEBUG SETTING: print all queries sent to MySQL to stdout | false |
mysql_experimental_filter_pushdown |
Whether or not to use filter pushdown (currently experimental) | false |
mysql_tinyint1_as_boolean |
Whether or not to convert TINYINT(1) columns to BOOLEAN |
true |
To avoid having to continuously fetch schema data from MySQL, DuckDB keeps schema information – such as the names of tables, their columns, etc. – cached. If changes are made to the schema through a different connection to the MySQL instance, such as new columns being added to a table, the cached schema information might be outdated. In this case, the function mysql_clear_cache
can be executed to clear the internal caches.
CALL mysql_clear_cache();
layout: docu title: Vector Similarity Search Extension github_repository: https://github.com/duckdb/duckdb-vss
The vss
extension is an experimental extension for DuckDB that adds indexing support to accelerate vector similarity search queries using DuckDB's new fixed-size ARRAY
type.
See the [announcement blog post]({% post_url 2024-05-03-vector-similarity-search-vss %}) and the [“What's New in the Vector Similarity Search Extension?” post]({% post_url 2024-10-23-whats-new-in-the-vss-extension %}).
To create a new HNSW (Hierarchical Navigable Small Worlds) index on a table with an ARRAY
column, use the CREATE INDEX
statement with the USING HNSW
clause. For example:
INSTALL vss;
LOAD vss;
CREATE TABLE my_vector_table (vec FLOAT[3]);
INSERT INTO my_vector_table
SELECT array_value(a, b, c)
FROM range(1, 10) ra(a), range(1, 10) rb(b), range(1, 10) rc(c);
CREATE INDEX my_hnsw_index ON my_vector_table USING HNSW (vec);
The index will then be used to accelerate queries that use a ORDER BY
clause evaluating one of the supported distance metric functions against the indexed columns and a constant vector, followed by a LIMIT
clause. For example:
SELECT *
FROM my_vector_table
ORDER BY array_distance(vec, [1, 2, 3]::FLOAT[3])
LIMIT 3;
Additionally, the overloaded min_by(col, arg, n)
can also be accelerated with the HNSW
index if the arg
argument is a matching distance metric function. This can be used to do quick one-shot nearest neighbor searches. For example, to get the top 3 rows with the closest vectors to [1, 2, 3]
:
SELECT min_by(my_vector_table, array_distance(vec, [1, 2, 3]::FLOAT[3]), 3) AS result
FROM my_vector_table;
---- [{'vec': [1.0, 2.0, 3.0]}, {'vec': [1.0, 2.0, 4.0]}, {'vec': [2.0, 2.0, 3.0]}]
Note how we pass the table name as the first argument to min_by
to return a struct containing the entire matched row.
We can verify that the index is being used by checking the EXPLAIN
output and looking for the HNSW_INDEX_SCAN
node in the plan:
EXPLAIN
SELECT *
FROM my_vector_table
ORDER BY array_distance(vec, [1, 2, 3]::FLOAT[3])
LIMIT 3;
┌───────────────────────────┐
│ PROJECTION │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ #0 │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ PROJECTION │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ vec │
│array_distance(vec, [1.0, 2│
│ .0, 3.0]) │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ HNSW_INDEX_SCAN │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ t1 (HNSW INDEX SCAN : │
│ my_idx) │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ vec │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ EC: 3 │
└───────────────────────────┘
By default the HNSW index will be created using the euclidean distance l2sq
(L2-norm squared) metric, matching DuckDBs array_distance
function, but other distance metrics can be used by specifying the metric
option during index creation. For example:
CREATE INDEX my_hnsw_cosine_index
ON my_vector_table
USING HNSW (vec)
WITH (metric = 'cosine');
The following table shows the supported distance metrics and their corresponding DuckDB functions
Metric | Function | Description |
---|---|---|
l2sq |
array_distance |
Euclidean distance |
cosine |
array_cosine_distance |
Cosine similarity distance |
ip |
array_negative_inner_product |
Negative inner product |
Note that while each HNSW
index only applies to a single column you can create multiple HNSW
indexes on the same table each individually indexing a different column. Additionally, you can also create multiple HNSW
indexes to the same column, each supporting a different distance metric.
Besides the metric
option, the HNSW
index creation statement also supports the following options to control the hyperparameters of the index construction and search process:
Option | Default | Description |
---|---|---|
ef_construction |
128 | The number of candidate vertices to consider during the construction of the index. A higher value will result in a more accurate index, but will also increase the time it takes to build the index. |
ef_search |
64 | The number of candidate vertices to consider during the search phase of the index. A higher value will result in a more accurate index, but will also increase the time it takes to perform a search. |
M |
16 | The maximum number of neighbors to keep for each vertex in the graph. A higher value will result in a more accurate index, but will also increase the time it takes to build the index. |
M0 |
2 * M |
The base connectivity, or the number of neighbors to keep for each vertex in the zero-th level of the graph. A higher value will result in a more accurate index, but will also increase the time it takes to build the index. |
Additionally, you can also override the ef_search
parameter set at index construction time by setting the SET hnsw_ef_search = ⟨int⟩
configuration option at runtime. This can be useful if you want to trade search performance for accuracy or vice-versa on a per-connection basis. You can also unset the override by calling RESET hnsw_ef_search
.
Due to some known issues related to peristence of custom extension indexes, the HNSW
index can only be created on tables in in-memory databases by default, unless the SET hnsw_enable_experimental_persistence = ⟨bool⟩
configuration option is set to true
.
The reasoning for locking this feature behind an experimental flag is that “WAL” recovery is not yet properly implemented for custom indexes, meaning that if a crash occurs or the database is shut down unexpectedly while there are uncommitted changes to a HNSW
-indexed table, you can end up with data loss or corruption of the index.
If you enable this option and experience an unexpected shutdown, you can try to recover the index by first starting DuckDB separately, loading the vss
extension and then ATTACH
ing the database file, which ensures that the HNSW
index functionality is available during WAL-playback, allowing DuckDB's recovery process to proceed without issues. But we still recommend that you do not use this feature in production environments.
With the hnsw_enable_experimental_persistence
option enabled, the index will be persisted into the DuckDB database file (if you run DuckDB with a disk-backed database file), which means that after a database restart, the index can be loaded back into memory from disk instead of having to be re-created. With that in mind, there are no incremental updates to persistent index storage, so every time DuckDB performs a checkpoint the entire index will be serialized to disk and overwrite itself. Similarly, after a restart of the database, the index will be deserialized back into main memory in its entirety. Although this will be deferred until you first access the table associated with the index. Depending on how large the index is, the deserialization process may take some time, but it should still be faster than simply dropping and re-creating the index.
The HNSW index does support inserting, updating and deleting rows from the table after index creation. However, there are two things to keep in mind:
- It's faster to create the index after the table has been populated with data as the initial bulk load can make better use of parallelism on large tables.
- Deletes are not immediately reflected in the index, but are instead “marked” as deleted, which can cause the index to grow stale over time and negatively impact query quality and performance.
To remedy the last point, you can call the PRAGMA hnsw_compact_index('⟨index name⟩')
pragma function to trigger a re-compaction of the index pruning deleted items, or re-create the index after a significant number of updates.
The vss
extension also provides a couple of table macros to simplify matching multiple vectors against eachother, so called "fuzzy joins". These are:
vss_join(left_table, right_table, left_col, right_col, k, metric := 'l2sq')
vss_match(right_table", left_col, right_col, k, metric := 'l2sq')
These do not currently make use of the HNSW
index but are provided as convenience utility functions for users who are ok with performing brute-force vector similarity searches without having to write out the join logic themselves. In the future these might become targets for index-based optimizations as well.
These functions can be used as follows:
CREATE TABLE haystack (id int, vec FLOAT[3]);
CREATE TABLE needle (search_vec FLOAT[3]);
INSERT INTO haystack
SELECT row_number() OVER (), array_value(a,b,c)
FROM range(1, 10) ra(a), range(1, 10) rb(b), range(1, 10) rc(c);
INSERT INTO needle
VALUES ([5, 5, 5]), ([1, 1, 1]);
SELECT *
FROM vss_join(needle, haystack, search_vec, vec, 3) res;
┌───────┬─────────────────────────────────┬─────────────────────────────────────┐
│ score │ left_tbl │ right_tbl │
│ float │ struct(search_vec float[3]) │ struct(id integer, vec float[3]) │
├───────┼─────────────────────────────────┼─────────────────────────────────────┤
│ 0.0 │ {'search_vec': [5.0, 5.0, 5.0]} │ {'id': 365, 'vec': [5.0, 5.0, 5.0]} │
│ 1.0 │ {'search_vec': [5.0, 5.0, 5.0]} │ {'id': 364, 'vec': [5.0, 4.0, 5.0]} │
│ 1.0 │ {'search_vec': [5.0, 5.0, 5.0]} │ {'id': 356, 'vec': [4.0, 5.0, 5.0]} │
│ 0.0 │ {'search_vec': [1.0, 1.0, 1.0]} │ {'id': 1, 'vec': [1.0, 1.0, 1.0]} │
│ 1.0 │ {'search_vec': [1.0, 1.0, 1.0]} │ {'id': 10, 'vec': [2.0, 1.0, 1.0]} │
│ 1.0 │ {'search_vec': [1.0, 1.0, 1.0]} │ {'id': 2, 'vec': [1.0, 2.0, 1.0]} │
└───────┴─────────────────────────────────┴─────────────────────────────────────┘
-- Alternatively, we can use the vss_match macro as a "lateral join"
-- to get the matches already grouped by the left table.
-- Note that this requires us to specify the left table first, and then
-- the vss_match macro which references the search column from the left
-- table (in this case, `search_vec`).
SELECT *
FROM needle, vss_match(haystack, search_vec, vec, 3) res;
┌─────────────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ search_vec │ matches │
│ float[3] │ struct(score float, "row" struct(id integer, vec float[3]))[] │
├─────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ [5.0, 5.0, 5.0] │ [{'score': 0.0, 'row': {'id': 365, 'vec': [5.0, 5.0, 5.0]}}, {'score': 1.0, 'row': {'id': 364, 'vec': [5.0, 4.0, 5.0]}}, {'score': 1.0, 'row': {'id': 356, 'vec': [4.0, 5.0, 5.0]}}] │
│ [1.0, 1.0, 1.0] │ [{'score': 0.0, 'row': {'id': 1, 'vec': [1.0, 1.0, 1.0]}}, {'score': 1.0, 'row': {'id': 10, 'vec': [2.0, 1.0, 1.0]}}, {'score': 1.0, 'row': {'id': 2, 'vec': [1.0, 2.0, 1.0]}}] │
└─────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
- Only vectors consisting of
FLOAT
s (32-bit, single precision) are supported at the moment. - The index itself is not buffer managed and must be able to fit into RAM memory.
- The size of the index in memory does not count towards DuckDB's
memory_limit
configuration parameter. HNSW
indexes can only be created on tables in in-memory databases, unless theSET hnsw_enable_experimental_persistence = ⟨bool⟩
configuration option is set totrue
, see Persistence for more information.- The vector join table macros (
vss_join
andvss_match
) do not require or make use of theHNSW
index.
layout: docu title: AutoComplete Extension github_directory: https://github.com/duckdb/duckdb/tree/main/extension/autocomplete
The autocomplete
extension adds supports for autocomplete in the [CLI client]({% link docs/clients/cli/overview.md %}).
The extension is shipped by default with the CLI client.
For the behavior of the autocomplete
extension, see the [documentation of the CLI client]({% link docs/clients/cli/autocomplete.md %}).
Function | Description |
---|---|
sql_auto_complete(query_string) |
Attempts autocompletion on the given query_string . |
SELECT *
FROM sql_auto_complete('SEL');
Returns:
suggestion | suggestion_start |
---|---|
SELECT | 0 |
DELETE | 0 |
INSERT | 0 |
CALL | 0 |
LOAD | 0 |
CALL | 0 |
ALTER | 0 |
BEGIN | 0 |
EXPORT | 0 |
CREATE | 0 |
PREPARE | 0 |
EXECUTE | 0 |
EXPLAIN | 0 |
ROLLBACK | 0 |
DESCRIBE | 0 |
SUMMARIZE | 0 |
CHECKPOINT | 0 |
DEALLOCATE | 0 |
UPDATE | 0 |
DROP | 0 |
Most software has some sort of version number. Version numbers serve a few important goals:
- Tie a binary to a specific state of the source code
- Allow determining the expected feature set
- Allow determining the state of the APIs
- Allow efficient processing of bug reports (e.g., bug
#1337
was introduced in versionv3.4.5
) - Allow determining chronological order of releases (e.g., version
v1.2.3
is older thanv1.2.4
) - Give an indication of expected stability (e.g.,
v0.0.1
is likely not very stable, whereasv13.11.0
probably is stable)
Just like [DuckDB itself]({% link docs/dev/release_calendar.md %}), DuckDB extensions have their own version number. To ensure consistent semantics of these version numbers across the various extensions, DuckDB's [Core Extensions]({% link docs/extensions/core_extensions.md %}) use a versioning scheme that prescribes how extensions should be versioned. The versioning scheme for Core Extensions is made up of 3 different stability levels: unstable, pre-release, and stable. Let's go over each of the 3 levels and describe their format:
Unstable extensions are extensions that can't (or don't want to) give any guarantees regarding their current stability, or their goals of becoming stable. Unstable extensions are tagged with the short git hash of the extension.
For example, at the time of writing this, the version of the vss
extension is an unstable extension of version 690bfc5
.
What to expect from an extension that has a version number in the unstable format?
- The state of the source code of the extension can be found by looking up the hash in the extension repository
- Functionality may change or be removed completely with every release
- This extension's API could change with every release
- This extension may not follow a structured release cycle, new (breaking) versions can be pushed at any time
Pre-release extensions are the next step up from Unstable extensions. They are tagged with version in the SemVer format, more specifically, those in the v0.y.z
format.
In semantic versioning, versions starting with v0
have a special meaning: they indicate that the more strict semantics of regular (>v1.0.0
) versions do not yet apply. It basically means that an extensions is working
towards becoming a stable extension, but is not quite there yet.
For example, at the time of writing this, the version of the delta
extension is a pre-release extension of version v0.1.0
.
What to expect from an extension that has a version number in the pre-release format?
- The extension is compiled from the source code corresponding to the tag.
- Semantic Versioning semantics apply. See the Semantic Versioning specification for details.
- The extension follows a release cycle where new features are tested in nightly builds before being grouped into a release and pushed to the
core
repository. - Release notes describing what has been added each release should be available to make it easy to understand the difference between versions.
Stable extensions are the final step of extension stability. This is denoted by using a stable SemVer of format vx.y.z
where x>0
.
For example, at the time of writing this, the version of the parquet
extension is a stable extension of version v1.0.0
.
What to expect from an extension that has a version number in the stable format? Essentially the same as pre-release extensions, but now the more strict SemVer semantics apply: the API of the extension should now be stable and will only change in backwards incompatible ways when the major version is bumped. See the SemVer specification for details
In general for extensions the release cycle depends on their stability level. unstable extensions are often in
sync with DuckDB's release cycle, but may also be quietly updated between DuckDB releases. pre-release and stable
extensions follow their own release cycle. These may or may not coincide with DuckDB releases. To find out more about the release cycle of a specific
extension, refer to the documentation or GitHub page of the respective extension. Generally, pre-release and stable extensions will document
their releases as GitHub releases, an example of which you can see in the delta
extension.
Finally, there is a small exception: All [in-tree]({% link docs/extensions/working_with_extensions.md %}#in-tree-vs-out-of-tree) extensions simply follow DuckDB's release cycle.
Just like DuckDB itself, DuckDB's core extensions have nightly or dev builds that can be used to try out features before they are officially released. This can be useful when your workflow depends on a new feature, or when you need to confirm that your stack is compatible with the upcoming version.
Nightly builds for extensions are slightly complicated due to the fact that currently DuckDB extensions binaries are tightly bound to a single DuckDB version. Because of this tight connection, there is a potential risk for a combinatorial explosion. Therefore, not all combinations of nightly extension build and nightly DuckDB build are available.
In general, there are 2 ways of using nightly builds: using a nightly DuckDB build and using a stable DuckDB build. Let's go over the differences between the two:
In most cases, user's will be interested in a nightly build of a specific extension, but don't necessarily want to switch to using the nightly build of DuckDB itself. This allows using a specific bleeding-edge feature while limiting the exposure to unstable code.
To achieve this, Core Extensions tend to regularly push builds to the [core_nightly
repository]({% link docs/extensions/working_with_extensions.md %}#extension-repositories). Let's look at an example:
First we install a [stable DuckDB build]({% link docs/installation/index.html %}).
Then we can install and load a nightly extension like this:
INSTALL aws FROM core_nightly;
LOAD aws;
In this example we are using the latest nightly build of the aws extension with the latest stable version of DuckDB.
When DuckDB CI produces a nightly binary of DuckDB itself, the binaries are distributed with a set of extensions that are pinned at a specific version. This extension version will be tested for that specific build of DuckDB, but might not be the latest dev build. Let's look at an example:
First, we install a [nightly DuckDB build]({% link docs/installation/index.html %}). Then, we can install and load the aws
extension as expected:
INSTALL aws;
LOAD aws;
DuckDB has a dedicated statement that will automatically update all extensions to their latest version. The output will give the user information on which extensions were updated to/from which version. For example:
UPDATE EXTENSIONS;
extension_name | repository | update_result | previous_version | current_version |
---|---|---|---|---|
httpfs | core | NO_UPDATE_AVAILABLE | 70fd6a8a24 | 70fd6a8a24 |
delta | core | UPDATED | d9e5cc1 | 04c61e4 |
azure | core | NO_UPDATE_AVAILABLE | 49b63dc | 49b63dc |
aws | core_nightly | NO_UPDATE_AVAILABLE | 42c78d3 | 42c78d3 |
Note that DuckDB will look for updates in the source repository for each extension. So if an extension was installed from
core_nightly
, it will be updated with the latest nightly build.
The update statement can also be provided with a list of specific extensions to update:
UPDATE EXTENSIONS (httpfs, azure);
extension_name | repository | update_result | previous_version | current_version |
---|---|---|---|---|
httpfs | core | NO_UPDATE_AVAILABLE | 70fd6a8a24 | 70fd6a8a24 |
azure | core | NO_UPDATE_AVAILABLE | 49b63dc | 49b63dc |
Currently, when extensions are compiled, they are tied to a specific version of DuckDB. What this means is that, for example, an extension binary compiled for v0.10.3 does not work for v1.0.0. In most cases, this will not cause any issues and is fully transparent; DuckDB will automatically ensure it installs the correct binary for its version. For extension developers, this means that they must ensure that new binaries are created whenever a new version of DuckDB is released. However, note that DuckDB provides an extension template that makes this fairly simple.
layout: docu title: ICU Extension github_directory: https://github.com/duckdb/duckdb/tree/main/extension/icu
The icu
extension contains an easy-to-use version of the collation/timezone part of the ICU library.
The icu
extension will be transparently [autoloaded]({% link docs/extensions/overview.md %}#autoloading-extensions) on first use from the official extension repository.
If you would like to install and load it manually, run:
INSTALL icu;
LOAD icu;
The icu
extension introduces the following features:
- [Region-dependent collations]({% link docs/sql/expressions/collations.md %})
- [Time zones]({% link docs/sql/data_types/timezones.md %}), used for [timestamp data types]({% link docs/sql/data_types/timestamp.md %}) and [timestamp functions]({% link docs/sql/functions/timestamptz.md %})
Community-contributed extensions can be installed from the Community Extensions repository since [summer 2024]({% post_url 2024-07-05-community-extensions %}). Please visit the [Community Extensions section]({% link community_extensions/index.md %}) of the documentation for more details.
layout: docu title: Full-Text Search Extension github_repository: https://github.com/duckdb/duckdb-fts
Full-Text Search is an extension to DuckDB that allows for search through strings, similar to SQLite's FTS5 extension.
The fts
extension will be transparently [autoloaded]({% link docs/extensions/overview.md %}#autoloading-extensions) on first use from the official extension repository.
If you would like to install and load it manually, run:
INSTALL fts;
LOAD fts;
The extension adds two PRAGMA
statements to DuckDB: one to create, and one to drop an index. Additionally, a scalar macro stem
is added, which is used internally by the extension.
create_fts_index(input_table, input_id, *input_values, stemmer = 'porter',
stopwords = 'english', ignore = '(\\.|[^a-z])+',
strip_accents = 1, lower = 1, overwrite = 0)
PRAGMA
that creates a FTS index for the specified table.
Name | Type | Description |
---|---|---|
input_table |
VARCHAR |
Qualified name of specified table, e.g., 'table_name' or 'main.table_name' |
input_id |
VARCHAR |
Column name of document identifier, e.g., 'document_identifier' |
input_values… |
VARCHAR |
Column names of the text fields to be indexed (vararg), e.g., 'text_field_1' , 'text_field_2' , ..., 'text_field_N' , or '\*' for all columns in input_table of type VARCHAR |
stemmer |
VARCHAR |
The type of stemmer to be used. One of 'arabic' , 'basque' , 'catalan' , 'danish' , 'dutch' , 'english' , 'finnish' , 'french' , 'german' , 'greek' , 'hindi' , 'hungarian' , 'indonesian' , 'irish' , 'italian' , 'lithuanian' , 'nepali' , 'norwegian' , 'porter' , 'portuguese' , 'romanian' , 'russian' , 'serbian' , 'spanish' , 'swedish' , 'tamil' , 'turkish' , or 'none' if no stemming is to be used. Defaults to 'porter' |
stopwords |
VARCHAR |
Qualified name of table containing a single VARCHAR column containing the desired stopwords, or 'none' if no stopwords are to be used. Defaults to 'english' for a pre-defined list of 571 English stopwords |
ignore |
VARCHAR |
Regular expression of patterns to be ignored. Defaults to `'(\. |
strip_accents |
BOOLEAN |
Whether to remove accents (e.g., convert á to a ). Defaults to 1 |
lower |
BOOLEAN |
Whether to convert all text to lowercase. Defaults to 1 |
overwrite |
BOOLEAN |
Whether to overwrite an existing index on a table. Defaults to 0 |
This PRAGMA
builds the index under a newly created schema. The schema will be named after the input table: if an index is created on table 'main.table_name'
, then the schema will be named 'fts_main_table_name'
.
drop_fts_index(input_table)
Drops a FTS index for the specified table.
Name | Type | Description |
---|---|---|
input_table |
VARCHAR |
Qualified name of input table, e.g., 'table_name' or 'main.table_name' |
match_bm25(input_id, query_string, fields := NULL, k := 1.2, b := 0.75, conjunctive := 0)
When an index is built, this retrieval macro is created that can be used to search the index.
Name | Type | Description |
---|---|---|
input_id |
VARCHAR |
Column name of document identifier, e.g., 'document_identifier' |
query_string |
VARCHAR |
The string to search the index for |
fields |
VARCHAR |
Comma-separarated list of fields to search in, e.g., 'text_field_2, text_field_N' . Defaults to NULL to search all indexed fields |
k |
DOUBLE |
Parameter k1 in the Okapi BM25 retrieval model. Defaults to 1.2 |
b |
DOUBLE |
Parameter b in the Okapi BM25 retrieval model. Defaults to 0.75 |
conjunctive |
BOOLEAN |
Whether to make the query conjunctive i.e., all terms in the query string must be present in order for a document to be retrieved |
stem(input_string, stemmer)
Reduces words to their base. Used internally by the extension.
Name | Type | Description |
---|---|---|
input_string |
VARCHAR |
The column or constant to be stemmed. |
stemmer |
VARCHAR |
The type of stemmer to be used. One of 'arabic' , 'basque' , 'catalan' , 'danish' , 'dutch' , 'english' , 'finnish' , 'french' , 'german' , 'greek' , 'hindi' , 'hungarian' , 'indonesian' , 'irish' , 'italian' , 'lithuanian' , 'nepali' , 'norwegian' , 'porter' , 'portuguese' , 'romanian' , 'russian' , 'serbian' , 'spanish' , 'swedish' , 'tamil' , 'turkish' , or 'none' if no stemming is to be used. |
Create a table and fill it with text data:
CREATE TABLE documents (
document_identifier VARCHAR,
text_content VARCHAR,
author VARCHAR,
doc_version INTEGER
);
INSERT INTO documents
VALUES ('doc1',
'The mallard is a dabbling duck that breeds throughout the temperate.',
'Hannes Mühleisen',
3),
('doc2',
'The cat is a domestic species of small carnivorous mammal.',
'Laurens Kuiper',
2
);
Build the index, and make both the text_content
and author
columns searchable.
PRAGMA create_fts_index(
'documents', 'document_identifier', 'text_content', 'author'
);
Search the author
field index for documents that are authored by Muhleisen
. This retrieves doc1
:
SELECT document_identifier, text_content, score
FROM (
SELECT *, fts_main_documents.match_bm25(
document_identifier,
'Muhleisen',
fields := 'author'
) AS score
FROM documents
) sq
WHERE score IS NOT NULL
AND doc_version > 2
ORDER BY score DESC;
document_identifier | text_content | score |
---|---|---|
doc1 | The mallard is a dabbling duck that breeds throughout the temperate. | 0.0 |
Search for documents about small cats
. This retrieves doc2
:
SELECT document_identifier, text_content, score
FROM (
SELECT *, fts_main_documents.match_bm25(
document_identifier,
'small cats'
) AS score
FROM documents
) sq
WHERE score IS NOT NULL
ORDER BY score DESC;
document_identifier | text_content | score |
---|---|---|
doc2 | The cat is a domestic species of small carnivorous mammal. | 0.0 |
Warning The FTS index will not update automatically when input table changes. A workaround of this limitation can be recreating the index to refresh.
layout: docu title: inet Extension github_repository: https://github.com/duckdb/duckdb-inet
The inet
extension defines the INET
data type for storing IPv4 and IPv6 Internet addresses. It supports the CIDR notation for subnet masks (e.g., 198.51.100.0/22
, 2001:db8:3c4d::/48
).
The inet
extension will be transparently [autoloaded]({% link docs/extensions/overview.md %}#autoloading-extensions) on first use from the official extension repository.
If you would like to install and load it manually, run:
INSTALL inet;
LOAD inet;
SELECT '127.0.0.1'::INET AS ipv4, '2001:db8:3c4d::/48'::INET AS ipv6;
ipv4 | ipv6 |
---|---|
127.0.0.1 | 2001:db8:3c4d::/48 |
CREATE TABLE tbl (id INTEGER, ip INET);
INSERT INTO tbl VALUES
(1, '192.168.0.0/16'),
(2, '127.0.0.1'),
(3, '8.8.8.8'),
(4, 'fe80::/10'),
(5, '2001:db8:3c4d:15::1a2f:1a2b');
SELECT * FROM tbl;
id | ip |
---|---|
1 | 192.168.0.0/16 |
2 | 127.0.0.1 |
3 | 8.8.8.8 |
4 | fe80::/10 |
5 | 2001:db8:3c4d:15::1a2f:1a2b |
INET
values can be compared naturally, and IPv4 will sort before IPv6. Additionally, IP addresses can be modified by adding or subtracting integers.
CREATE TABLE tbl (cidr INET);
INSERT INTO tbl VALUES
('127.0.0.1'::INET + 10),
('fe80::10'::INET - 9),
('127.0.0.1'),
('2001:db8:3c4d:15::1a2f:1a2b');
SELECT cidr FROM tbl ORDER BY cidr ASC;
cidr |
---|
127.0.0.1 |
127.0.0.11 |
2001:db8:3c4d:15::1a2f:1a2b |
fe80::7 |
The host component of an INET
value can be extracted using the HOST()
function.
CREATE TABLE tbl (cidr INET);
INSERT INTO tbl VALUES
('192.168.0.0/16'),
('127.0.0.1'),
('2001:db8:3c4d:15::1a2f:1a2b/96');
SELECT cidr, host(cidr) FROM tbl;
cidr | host(cidr) |
---|---|
192.168.0.0/16 | 192.168.0.0 |
127.0.0.1 | 127.0.0.1 |
2001:db8:3c4d:15::1a2f:1a2b/96 | 2001:db8:3c4d:15::1a2f:1a2b |
Computes the network mask for the address's network.
CREATE TABLE tbl (cidr INET);
INSERT INTO tbl VALUES
('192.168.1.5/24'),
('127.0.0.1'),
('2001:db8:3c4d:15::1a2f:1a2b/96');
SELECT cidr, netmask(cidr) FROM tbl;
cidr | netmask(cidr) |
---|---|
192.168.1.5/24 | 255.255.255.0/24 |
127.0.0.1 | 255.255.255.255 |
2001:db8:3c4d:15::1a2f:1a2b/96 | ffff:ffff:ffff:ffff:ffff:ffff::/96 |
Returns the network part of the address, zeroing out whatever is to the right of the netmask.
CREATE TABLE tbl (cidr INET);
INSERT INTO tbl VALUES
('192.168.1.5/24'),
('127.0.0.1'),
('2001:db8:3c4d:15::1a2f:1a2b/96');
SELECT cidr, network(cidr) FROM tbl;
cidr | network(cidr) |
---|---|
192.168.1.5/24 | 192.168.1.0/24 |
127.0.0.1 | 255.255.255.255 |
2001:db8:3c4d:15::1a2f:1a2b/96 | ffff:ffff:ffff:ffff:ffff:ffff::/96 |
Computes the broadcast address for the address's network.
CREATE TABLE tbl (cidr INET);
INSERT INTO tbl VALUES
('192.168.1.5/24'),
('127.0.0.1'),
('2001:db8:3c4d:15::1a2f:1a2b/96');
SELECT cidr, broadcast(cidr) FROM tbl;
cidr | broadcast(cidr) |
---|---|
192.168.1.5/24 | 192.168.1.0/24 |
127.0.0.1 | 127.0.0.1 |
2001:db8:3c4d:15::1a2f:1a2b/96 | 2001:db8:3c4d:15::/96 |
Is subnet contained by or equal to subnet?
CREATE TABLE tbl (cidr INET);
INSERT INTO tbl VALUES
('192.168.1.0/24'),
('127.0.0.1'),
('2001:db8:3c4d:15::1a2f:1a2b/96');
SELECT cidr, INET '192.168.1.5/32' <<= cidr FROM tbl;
cidr | (CAST('192.168.1.5/32' AS INET) <<= cidr) |
---|---|
192.168.1.5/24 | true |
127.0.0.1 | false |
2001:db8:3c4d:15::1a2f:1a2b/96 | false |
Does subnet contain or equal subnet?
CREATE TABLE tbl (cidr INET);
INSERT INTO tbl VALUES
('192.168.1.0/24'),
('127.0.0.1'),
('2001:db8:3c4d:15::1a2f:1a2b/96');
SELECT cidr, INET '192.168.0.0/16' >>= cidr FROM tbl;
cidr | (CAST('192.168.0.0/16' AS INET) >>= cidr) |
---|---|
192.168.1.5/24 | true |
127.0.0.1 | false |
2001:db8:3c4d:15::1a2f:1a2b/96 | false |
SELECT html_escape('&');
┌──────────────────┐
│ html_escape('&') │
│ varchar │
├──────────────────┤
│ & │
└──────────────────┘
SELECT html_unescape('&');
┌────────────────────────┐
│ html_unescape('&') │
│ varchar │
├────────────────────────┤
│ & │
└────────────────────────┘
layout: docu title: SQLSmith Extension github_repository: https://github.com/duckdb/duckdb-sqlsmith
The sqlsmith
extension is used for testing.
INSTALL sqlsmith;
LOAD sqlsmith;
The sqlsmith
extension registers the following functions:
sqlsmith
fuzzyduck
reduce_sql_statement
fuzz_all_functions
layout: docu title: jemalloc Extension github_directory: https://github.com/duckdb/duckdb/tree/main/extension/jemalloc
The jemalloc
extension replaces the system's memory allocator with jemalloc.
Unlike other DuckDB extensions, the jemalloc
extension is statically linked and cannot be installed or loaded during runtime.
The availability of the jemalloc
extension depends on the operating system.
Linux distributions of DuckDB ships with the jemalloc
extension.
To disable the jemalloc
extension, [build DuckDB from source]({% link docs/dev/building/overview.md %}) and set the SKIP_EXTENSIONS
flag as follows:
GEN=ninja SKIP_EXTENSIONS="jemalloc" make
The macOS version of DuckDB does not ship with the jemalloc
extension but can be [built from source]({% link docs/dev/building/macos.md %}) to include it:
GEN=ninja BUILD_JEMALLOC=1 make
On Windows, this extension is not available.
The jemalloc allocator in DuckDB can be configured via the MALLOC_CONF
environment variable.
By default, jemalloc's background threads are disabled. To enable them, use the following configuration option:
SET allocator_background_threads = true;
Background threads asynchronously purge outstanding allocations so that this doesn't have to be done synchronously by the foreground threads. This improves allocation performance, and should be noticeable in allocation-heavy workloads, especially on many-core CPUs.
layout: docu title: httpfs Extension for HTTP and S3 Support github_repository: https://github.com/duckdb/duckdb-httpfs redirect_from:
- /docs/extensions/httpfs
- /docs/extensions/httpfs/
The httpfs
extension is an autoloadable extension implementing a file system that allows reading remote/writing remote files.
For plain HTTP(S), only file reading is supported. For object storage using the S3 API, the httpfs
extension supports reading/writing/[globbing]({% link docs/sql/functions/pattern_matching.md %}#globbing) files.
The httpfs
extension will be, by default, autoloaded on first use of any functionality exposed by this extension.
To manually install and load the httpfs
extension, run:
INSTALL httpfs;
LOAD httpfs;
The httpfs
extension supports connecting to [HTTP(S) endpoints]({% link docs/extensions/httpfs/https.md %}).
The httpfs
extension supports connecting to [S3 API endpoints]({% link docs/extensions/httpfs/s3api.md %}).
Prior to version 0.10.0, DuckDB did not have a [Secrets manager]({% link docs/sql/statements/create_secret.md %}). Hence, the configuration of and authentication to S3 endpoints was handled via variables. This page documents the legacy authentication scheme for the S3 API.
The recommended way to configuration and authentication of S3 endpoints is to use [secrets]({% link docs/extensions/httpfs/s3api.md %}#configuration-and-authentication).
To be able to read or write from S3, the correct region should be set:
SET s3_region = 'us-east-1';
Optionally, the endpoint can be configured in case a non-AWS object storage server is used:
SET s3_endpoint = '⟨domain⟩.⟨tld⟩:⟨port⟩';
If the endpoint is not SSL-enabled then run:
SET s3_use_ssl = false;
Switching between path-style and vhost-style URLs is possible using:
SET s3_url_style = 'path';
However, note that this may also require updating the endpoint. For example for AWS S3 it is required to change the endpoint to s3.⟨region⟩.amazonaws.com
.
After configuring the correct endpoint and region, public files can be read. To also read private files, authentication credentials can be added:
SET s3_access_key_id = '⟨AWS access key id⟩';
SET s3_secret_access_key = '⟨AWS secret access key⟩';
Alternatively, temporary S3 credentials are also supported. They require setting an additional session token:
SET s3_session_token = '⟨AWS session token⟩';
The [aws
extension]({% link docs/extensions/aws.md %}) allows for loading AWS credentials.
Aside from the global S3 configuration described above, specific configuration values can be used on a per-request basis. This allows for use of multiple sets of credentials, regions, etc. These are used by including them on the S3 URI as query parameters. All the individual configuration values listed above can be set as query parameters. For instance:
SELECT *
FROM 's3://bucket/file.parquet?s3_access_key_id=accessKey&s3_secret_access_key=secretKey';
Multiple configurations per query are also allowed:
SELECT *
FROM 's3://bucket/file.parquet?s3_access_key_id=accessKey1&s3_secret_access_key=secretKey1' t1
INNER JOIN 's3://bucket/file.csv?s3_access_key_id=accessKey2&s3_secret_access_key=secretKey2' t2;
Some additional configuration options exist for the S3 upload, though the default values should suffice for most use cases.
Additionally, most of the configuration options can be set via environment variables:
DuckDB setting | Environment variable | Note |
---|---|---|
s3_region |
AWS_REGION |
Takes priority over AWS_DEFAULT_REGION |
s3_region |
AWS_DEFAULT_REGION |
|
s3_access_key_id |
AWS_ACCESS_KEY_ID |
|
s3_secret_access_key |
AWS_SECRET_ACCESS_KEY |
|
s3_session_token |
AWS_SESSION_TOKEN |
|
s3_endpoint |
DUCKDB_S3_ENDPOINT |
|
s3_use_ssl |
DUCKDB_S3_USE_SSL |
The httpfs
extension introduces support for the hf://
protocol to access data sets hosted in Hugging Face repositories.
See the [announcement blog post]({% post_url 2024-05-29-access-150k-plus-datasets-from-hugging-face-with-duckdb %}) for details.
Hugging Face repositories can be queried using the following URL pattern:
hf://datasets/⟨my_username⟩/⟨my_dataset⟩/⟨path_to_file⟩
For example, to read a CSV file, you can use the following query:
SELECT *
FROM 'hf://datasets/datasets-examples/doc-formats-csv-1/data.csv';
Where:
datasets-examples
is the name of the user/organizationdoc-formats-csv-1
is the name of the dataset repositorydata.csv
is the file path in the repository
The result of the query is:
kind | sound |
---|---|
dog | woof |
cat | meow |
pokemon | pika |
human | hello |
To read a JSONL file, you can run:
SELECT *
FROM 'hf://datasets/datasets-examples/doc-formats-jsonl-1/data.jsonl';
Finally, for reading a Parquet file, use the following query:
SELECT *
FROM 'hf://datasets/datasets-examples/doc-formats-parquet-1/data/train-00000-of-00001.parquet';
Each of these commands reads the data from the specified file format and displays it in a structured tabular format. Choose the appropriate command based on the file format you are working with.
To avoid accessing the remote endpoint for every query, you can save the data in a DuckDB table by running a [CREATE TABLE ... AS
command]({% link docs/sql/statements/create_table.md %}#create-table--as-select-ctas). For example:
CREATE TABLE data AS
SELECT *
FROM 'hf://datasets/datasets-examples/doc-formats-csv-1/data.csv';
Then, simply query the data
table as follows:
SELECT *
FROM data;
To query all files under a specific directory, you can use a [glob pattern]({% link docs/data/multiple_files/overview.md %}#multi-file-reads-and-globs). For example:
SELECT count(*) AS count
FROM 'hf://datasets/cais/mmlu/astronomy/*.parquet';
count |
---|
173 |
By using glob patterns, you can efficiently handle large datasets and perform comprehensive queries across multiple files, simplifying your data inspections and processing tasks. Here, you can see how you can look for questions that contain the word “planet” in astronomy:
SELECT count(*) AS count
FROM 'hf://datasets/cais/mmlu/astronomy/*.parquet'
WHERE question LIKE '%planet%';
count |
---|
21 |
In Hugging Face repositories, dataset versions or revisions are different dataset updates. Each version is a snapshot at a specific time, allowing you to track changes and improvements. In git terms, it can be understood as a branch or specific commit.
You can query different dataset versions/revisions by using the following URL:
hf://datasets/⟨my-username⟩/⟨my-dataset⟩@⟨my_branch⟩/⟨path_to_file⟩
For example:
SELECT *
FROM 'hf://datasets/datasets-examples/doc-formats-csv-1@~parquet/**/*.parquet';
kind | sound |
---|---|
dog | woof |
cat | meow |
pokemon | pika |
human | hello |
The previous query will read all parquet files under the ~parquet
revision. This is a special branch where Hugging Face automatically generates the Parquet files of every dataset to enable efficient scanning.
Configure your Hugging Face Token in the DuckDB Secrets Manager to access private or gated datasets. First, visit Hugging Face Settings – Tokens to obtain your access token. Second, set it in your DuckDB session using DuckDB’s [Secrets Manager]({% link docs/configuration/secrets_manager.md %}). DuckDB supports two providers for managing secrets:
The user must pass all configuration information into the CREATE SECRET
statement. To create a secret using the CONFIG
provider, use the following command:
CREATE SECRET hf_token (
TYPE HUGGINGFACE,
TOKEN 'your_hf_token'
);
Automatically tries to fetch credentials. For the Hugging Face token, it will try to get it from ~/.cache/huggingface/token
. To create a secret using the CREDENTIAL_CHAIN
provider, use the following command:
CREATE SECRET hf_token (
TYPE HUGGINGFACE,
PROVIDER CREDENTIAL_CHAIN
);
With the httpfs
extension, it is possible to directly query files over the HTTP(S) protocol. This works for all files supported by DuckDB or its various extensions, and provides read-only access.
SELECT *
FROM 'https://domain.tld/file.extension';
For CSV files, files will be downloaded entirely in most cases, due to the row-based nature of the format.
For Parquet files, DuckDB supports [partial reading]({% link docs/data/parquet/overview.md %}#partial-reading), i.e., it can use a combination of the Parquet metadata and HTTP range requests to only download the parts of the file that are actually required by the query. For example, the following query will only read the Parquet metadata and the data for the column_a
column:
SELECT column_a
FROM 'https://domain.tld/file.parquet';
In some cases, no actual data needs to be read at all as they only require reading the metadata:
SELECT count(*)
FROM 'https://domain.tld/file.parquet';
Scanning multiple files over HTTP(S) is also supported:
SELECT *
FROM read_parquet([
'https://domain.tld/file1.parquet',
'https://domain.tld/file2.parquet'
]);
To authenticate for an HTTP(S) endpoint, create an HTTP
secret using the [Secrets Manager]({% link docs/configuration/secrets_manager.md %}):
CREATE SECRET http_auth (
TYPE HTTP,
BEARER_TOKEN '⟨token⟩'
);
Or:
CREATE SECRET http_auth (
TYPE HTTP,
EXTRA_HTTP_HEADERS MAP {
'Authorization': 'Bearer ⟨token⟩'
}
);
DuckDB supports HTTP proxies.
You can add an HTTP proxy using the [Secrets Manager]({% link docs/configuration/secrets_manager.md %}):
CREATE SECRET http_proxy (
TYPE HTTP,
HTTP_PROXY '⟨http_proxy_url⟩',
HTTP_PROXY_USERNAME '⟨username⟩',
HTTP_PROXY_PASSWORD '⟨password⟩'
);
Alternatively, you can add it via [configuration options]({% link docs/configuration/pragmas.md %}):
SET http_proxy = '⟨http_proxy_url⟩';
SET http_proxy_username = '⟨username⟩';
SET http_proxy_password = '⟨password⟩';
To use the httpfs
extension with a custom certificate file, set the following [configuration options]({% link docs/configuration/pragmas.md %}) prior to loading the extension:
LOAD httpfs;
SET ca_cert_file = '⟨certificate_file⟩';
SET enable_server_cert_verification = true;
layout: docu title: Arrow Extension github_repository: https://github.com/duckdb/arrow
The arrow
extension implements features for using Apache Arrow, a cross-language development platform for in-memory analytics.
See the [announcement blog post]({% post_url 2021-12-03-duck-arrow %}) for more details.
The arrow
extension will be transparently autoloaded on first use from the official extension repository.
If you would like to install and load it manually, run:
INSTALL arrow;
LOAD arrow;
Function | Type | Description |
---|---|---|
to_arrow_ipc |
Table in-out function | Serializes a table into a stream of blobs containing Arrow IPC buffers |
scan_arrow_ipc |
Table function | Scan a list of pointers pointing to Arrow IPC buffers |
layout: docu title: AWS Extension github_repository: https://github.com/duckdb/duckdb_aws
The aws
extension adds functionality (e.g., authentication) on top of the httpfs
extension's [S3 capabilities]({% link docs/extensions/httpfs/overview.md %}#s3-api), using the AWS SDK.
Warning In most cases, you will not need to explicitly interact with the
aws
extension. It will automatically be invoked whenever you use DuckDB's [S3 Secret functionality]({% link docs/sql/statements/create_secret.md %}). See the [httpfs
extension's S3 capabilities]({% link docs/extensions/httpfs/overview.md %}#s3) for instructions.
The aws
extension will be transparently [autoloaded]({% link docs/extensions/overview.md %}#autoloading-extensions) on first use from the official extension repository.
If you would like to install and load it manually, run:
INSTALL aws;
LOAD aws;
aws
depends on httpfs
extension capabilities, and both will be autoloaded on the first call to load_aws_credentials
.
If autoinstall or autoload are disabled, you can always explicitly install and load httpfs
as follows:
INSTALL httpfs;
LOAD httpfs;
Deprecated The
load_aws_credentials
function is deprecated.
Prior to version 0.10.0, DuckDB did not have a [Secrets manager]({% link docs/sql/statements/create_secret.md %}), to load the credentials automatically, the AWS extension provided a special function to load the AWS credentials in the [legacy authentication method]({% link docs/extensions/httpfs/s3api_legacy_authentication.md %}).
Function | Type | Description |
---|---|---|
load_aws_credentials |
PRAGMA function |
Loads the AWS credentials through the AWS Default Credentials Provider Chain |
To load the AWS credentials, run:
CALL load_aws_credentials();
loaded_access_key_id | loaded_secret_access_key | loaded_session_token | loaded_region |
---|---|---|---|
AKIAIOSFODNN7EXAMPLE | <redacted> |
NULL | us-east-2 |
The function takes a string parameter to specify a specific profile:
CALL load_aws_credentials('minio-testing-2');
loaded_access_key_id | loaded_secret_access_key | loaded_session_token | loaded_region |
---|---|---|---|
minio_duckdb_user_2 | <redacted> |
NULL | NULL |
There are several parameters to tweak the behavior of the call:
CALL load_aws_credentials('minio-testing-2', set_region = false, redact_secret = false);
loaded_access_key_id | loaded_secret_access_key | loaded_session_token | loaded_region |
---|---|---|---|
minio_duckdb_user_2 | minio_duckdb_user_password_2 | NULL | NULL |
The httpfs
extension supports reading/writing/globbing files on object storage servers using the S3 API. S3 offers a standard API to read and write to remote files (while regular http servers, predating S3, do not offer a common write API). DuckDB conforms to the S3 API, that is now common among industry storage providers.
The httpfs
filesystem is tested with AWS S3, Minio, Google Cloud, and lakeFS. Other services that implement the S3 API (such as Cloudflare R2) should also work, but not all features may be supported.
The following table shows which parts of the S3 API are required for each httpfs
feature.
Feature | Required S3 API features |
---|---|
Public file reads | HTTP Range requests |
Private file reads | Secret key or session token authentication |
File glob | ListObjectV2 |
File writes | Multipart upload |
The preferred way to configure and authenticate to S3 endpoints is to use [secrets]({% link docs/sql/statements/create_secret.md %}). Multiple secret providers are available.
Deprecated Prior to version 0.10.0, DuckDB did not have a [Secrets manager]({% link docs/sql/statements/create_secret.md %}). Hence, the configuration of and authentication to S3 endpoints was handled via variables. See the [legacy authentication scheme for the S3 API]({% link docs/extensions/httpfs/s3api_legacy_authentication.md %}).
The default provider, CONFIG
(i.e., user-configured), allows access to the S3 bucket by manually providing a key. For example:
CREATE SECRET secret1 (
TYPE S3,
KEY_ID 'AKIAIOSFODNN7EXAMPLE',
SECRET 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY',
REGION 'us-east-1'
);
Tip If you get an IO Error (
Connection error for HTTP HEAD
), configure the endpoint explicitly viaENDPOINT 's3.⟨your-region⟩.amazonaws.com'
.
Now, to query using the above secret, simply query any s3://
prefixed file:
SELECT *
FROM 's3://my-bucket/file.parquet';
The CREDENTIAL_CHAIN
provider allows automatically fetching credentials using mechanisms provided by the AWS SDK. For example, to use the AWS SDK default provider:
CREATE SECRET secret2 (
TYPE S3,
PROVIDER CREDENTIAL_CHAIN
);
Again, to query a file using the above secret, simply query any s3://
prefixed file.
DuckDB also allows specifying a specific chain using the CHAIN
keyword. This takes a semicolon-separated list (a;b;c
) of providers that will be tried in order. For example:
CREATE SECRET secret3 (
TYPE S3,
PROVIDER CREDENTIAL_CHAIN,
CHAIN 'env;config'
);
The possible values for CHAIN
are the following:
The CREDENTIAL_CHAIN
provider also allows overriding the automatically fetched config. For example, to automatically load credentials, and then override the region, run:
CREATE SECRET secret4 (
TYPE S3,
PROVIDER CREDENTIAL_CHAIN,
CHAIN 'config',
REGION 'eu-west-1'
);
Below is a complete list of the supported parameters that can be used for both the CONFIG
and CREDENTIAL_CHAIN
providers:
Name | Description | Secret | Type | Default |
---|---|---|---|---|
KEY_ID |
The ID of the key to use | S3 , GCS , R2 |
STRING |
- |
SECRET |
The secret of the key to use | S3 , GCS , R2 |
STRING |
- |
REGION |
The region for which to authenticate (should match the region of the bucket to query) | S3 , GCS , R2 |
STRING |
us-east-1 |
SESSION_TOKEN |
Optionally, a session token can be passed to use temporary credentials | S3 , GCS , R2 |
STRING |
- |
ENDPOINT |
Specify a custom S3 endpoint | S3 , GCS , R2 |
STRING |
s3.amazonaws.com for S3 , |
URL_STYLE |
Either vhost or path |
S3 , GCS , R2 |
STRING |
vhost for S3 , path for R2 and GCS |
USE_SSL |
Whether to use HTTPS or HTTP | S3 , GCS , R2 |
BOOLEAN |
true |
URL_COMPATIBILITY_MODE |
Can help when URLs contain problematic characters | S3 , GCS , R2 |
BOOLEAN |
true |
ACCOUNT_ID |
The R2 account ID to use for generating the endpoint URL | R2 |
STRING |
- |
While Cloudflare R2 uses the regular S3 API, DuckDB has a special Secret type, R2
, to make configuring it a bit simpler:
CREATE SECRET secret5 (
TYPE R2,
KEY_ID 'AKIAIOSFODNN7EXAMPLE',
SECRET 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY',
ACCOUNT_ID 'my_account_id'
);
Note the addition of the ACCOUNT_ID
which is used to generate to correct endpoint URL for you. Also note that for R2
Secrets can also use both the CONFIG
and CREDENTIAL_CHAIN
providers. Finally, R2
secrets are only available when using URLs starting with r2://
, for example:
SELECT *
FROM read_parquet('r2://some/file/that/uses/r2/secret/file.parquet');
While Google Cloud Storage is accessed by DuckDB using the S3 API, DuckDB has a special Secret type, GCS
, to make configuring it a bit simpler:
CREATE SECRET secret6 (
TYPE GCS,
KEY_ID 'my_key',
SECRET 'my_secret'
);
Note that the above secret, will automatically have the correct Google Cloud Storage endpoint configured. Also note that for GCS
Secrets can also use both the CONFIG
and CREDENTIAL_CHAIN
providers. Finally, GCS
secrets are only available when using URLs starting with gcs://
or gs://
, for example:
SELECT *
FROM read_parquet('gcs://some/file/that/uses/gcs/secret/file.parquet');
Reading files from S3 is now as simple as:
SELECT *
FROM 's3://bucket/file.extension';
The httpfs
extension supports [partial reading]({% link docs/extensions/httpfs/https.md %}#partial-reading) from S3 buckets.
Multiple files are also possible, for example:
SELECT *
FROM read_parquet([
's3://bucket/file1.parquet',
's3://bucket/file2.parquet'
]);
File [globbing]({% link docs/sql/functions/pattern_matching.md %}#globbing) is implemented using the ListObjectV2 API call and allows to use filesystem-like glob patterns to match multiple files, for example:
SELECT *
FROM read_parquet('s3://bucket/*.parquet');
This query matches all files in the root of the bucket with the [Parquet extension]({% link docs/data/parquet/overview.md %}).
Several features for matching are supported, such as *
to match any number of any character, ?
for any single character or [0-9]
for a single character in a range of characters:
SELECT count(*) FROM read_parquet('s3://bucket/folder*/100?/t[0-9].parquet');
A useful feature when using globs is the filename
option, which adds a column named filename
that encodes the file that a particular row originated from:
SELECT *
FROM read_parquet('s3://bucket/*.parquet', filename = true);
could for example result in:
column_a | column_b | filename |
---|---|---|
1 | examplevalue1 | s3://bucket/file1.parquet |
2 | examplevalue1 | s3://bucket/file2.parquet |
DuckDB also offers support for the [Hive partitioning scheme]({% link docs/data/partitioning/hive_partitioning.md %}), which is available when using HTTP(S) and S3 endpoints.
Writing to S3 uses the multipart upload API. This allows DuckDB to robustly upload files at high speed. Writing to S3 works for both CSV and Parquet:
COPY table_name TO 's3://bucket/file.extension';
Partitioned copy to S3 also works:
COPY table TO 's3://my-bucket/partitioned' (
FORMAT PARQUET,
PARTITION_BY (part_col_a, part_col_b)
);
An automatic check is performed for existing files/directories, which is currently quite conservative (and on S3 will add a bit of latency). To disable this check and force writing, an OVERWRITE_OR_IGNORE
flag is added:
COPY table TO 's3://my-bucket/partitioned' (
FORMAT PARQUET,
PARTITION_BY (part_col_a, part_col_b),
OVERWRITE_OR_IGNORE true
);
The naming scheme of the written files looks like this:
s3://my-bucket/partitioned/part_col_a=⟨val⟩/part_col_b=⟨val⟩/data_⟨thread_number⟩.parquet
Some additional configuration options exist for the S3 upload, though the default values should suffice for most use cases.
Name | Description |
---|---|
s3_uploader_max_parts_per_file |
used for part size calculation, see AWS docs |
s3_uploader_max_filesize |
used for part size calculation, see AWS docs |
s3_uploader_thread_limit |
maximum number of uploader threads |
layout: docu title: Spatial Extension github_repository: https://github.com/duckdb/duckdb_spatial redirect_from:
- /docs/extensions/spatial
- /docs/extensions/spatial/
The spatial
extension provides support for geospatial data processing in DuckDB.
For an overview of the extension, see our [blog post]({% post_url 2023-04-28-spatial %}).
To install and load the spatial
extension, run:
INSTALL spatial;
LOAD spatial;
The core of the spatial extension is the GEOMETRY
type. If you're unfamiliar with geospatial data and GIS tooling, this type probably works very different from what you'd expect.
On the surface, the GEOMETRY
type is a binary representation of “geometry” data made up out of sets of vertices (pairs of X and Y double
precision floats). But what makes it somewhat special is that its actually used to store one of several different geometry subtypes. These are POINT
, LINESTRING
, POLYGON
, as well as their “collection” equivalents, MULTIPOINT
, MULTILINESTRING
and MULTIPOLYGON
. Lastly there is GEOMETRYCOLLECTION
, which can contain any of the other subtypes, as well as other GEOMETRYCOLLECTION
s recursively.
This may seem strange at first, since DuckDB already have types like LIST
, STRUCT
and UNION
which could be used in a similar way, but the design and behavior of the GEOMETRY
type is actually based on the Simple Features geometry model, which is a standard used by many other databases and GIS software.
The spatial extension also includes a couple of experimental non-standard explicit geometry types, such as POINT_2D
, LINESTRING_2D
, POLYGON_2D
and BOX_2D
that are based on DuckDBs native nested types, such as STRUCT
and LIST
. Since these have a fixed and predictable internal memory layout, it is theoretically possible to optimize a lot of geospatial algorithms to be much faster when operating on these types than on the GEOMETRY
type. However, only a couple of functions in the spatial extension have been explicitly specialized for these types so far. All of these new types are implicitly castable to GEOMETRY
, but with a small conversion cost, so the GEOMETRY
type is still the recommended type to use for now if you are planning to work with a lot of different spatial functions.
GEOMETRY
is not currently capable of storing additional geometry types such as curved geometries or triangle networks. Additionally, the GEOMETRY
type does not store SRID information on a per value basis. These limitations may be addressed in the future.
layout: docu title: SQLite Extension github_repository: https://github.com/duckdb/duckdb-sqlite redirect_from:
- /docs/extensions/sqlite_scanner
- /docs/extensions/sqlite_scanner/
The SQLite extension allows DuckDB to directly read and write data from a SQLite database file. The data can be queried directly from the underlying SQLite tables. Data can be loaded from SQLite tables into DuckDB tables, or vice versa.
The sqlite
extension will be transparently [autoloaded]({% link docs/extensions/overview.md %}#autoloading-extensions) on first use from the official extension repository.
If you would like to install and load it manually, run:
INSTALL sqlite;
LOAD sqlite;
To make a SQLite file accessible to DuckDB, use the ATTACH
statement with the SQLITE
or SQLITE_SCANNER
type. Attached SQLite databases support both read and write operations.
For example, to attach to the sakila.db
file, run:
ATTACH 'sakila.db' (TYPE SQLITE);
USE sakila;
The tables in the file can be read as if they were normal DuckDB tables, but the underlying data is read directly from the SQLite tables in the file at query time.
SHOW TABLES;
name |
---|
actor |
address |
category |
city |
country |
customer |
customer_list |
film |
film_actor |
film_category |
film_list |
film_text |
inventory |
language |
payment |
rental |
sales_by_film_category |
sales_by_store |
staff |
staff_list |
store |
You can query the tables using SQL, e.g., using the example queries from sakila-examples.sql
:
SELECT
cat.name AS category_name,
sum(ifnull(pay.amount, 0)) AS revenue
FROM category cat
LEFT JOIN film_category flm_cat
ON cat.category_id = flm_cat.category_id
LEFT JOIN film fil
ON flm_cat.film_id = fil.film_id
LEFT JOIN inventory inv
ON fil.film_id = inv.film_id
LEFT JOIN rental ren
ON inv.inventory_id = ren.inventory_id
LEFT JOIN payment pay
ON ren.rental_id = pay.rental_id
GROUP BY cat.name
ORDER BY revenue DESC
LIMIT 5;
SQLite is a weakly typed database system. As such, when storing data in a SQLite table, types are not enforced. The following is valid SQL in SQLite:
CREATE TABLE numbers (i INTEGER);
INSERT INTO numbers VALUES ('hello');
DuckDB is a strongly typed database system, as such, it requires all columns to have defined types and the system rigorously checks data for correctness.
When querying SQLite, DuckDB must deduce a specific column type mapping. DuckDB follows SQLite's type affinity rules with a few extensions.
- If the declared type contains the string
INT
then it is translated into the typeBIGINT
- If the declared type of the column contains any of the strings
CHAR
,CLOB
, orTEXT
then it is translated intoVARCHAR
. - If the declared type for a column contains the string
BLOB
or if no type is specified then it is translated intoBLOB
. - If the declared type for a column contains any of the strings
REAL
,FLOA
,DOUB
,DEC
orNUM
then it is translated intoDOUBLE
. - If the declared type is
DATE
, then it is translated intoDATE
. - If the declared type contains the string
TIME
, then it is translated intoTIMESTAMP
. - If none of the above apply, then it is translated into
VARCHAR
.
As DuckDB enforces the corresponding columns to contain only correctly typed values, we cannot load the string “hello” into a column of type BIGINT
. As such, an error is thrown when reading from the “numbers” table above:
Mismatch Type Error: Invalid type in column "i": column was declared as integer, found "hello" of type "text" instead.
This error can be avoided by setting the sqlite_all_varchar
option:
SET GLOBAL sqlite_all_varchar = true;
When set, this option overrides the type conversion rules described above, and instead always converts the SQLite columns into a VARCHAR
column. Note that this setting must be set before sqlite_attach
is called.
SQLite databases can also be opened directly and can be used transparently instead of a DuckDB database file. In any client, when connecting, a path to a SQLite database file can be provided and the SQLite database will be opened instead.
For example, with the shell, a SQLite database can be opened as follows:
duckdb sakila.db
SELECT first_name
FROM actor
LIMIT 3;
first_name |
---|
PENELOPE |
NICK |
ED |
In addition to reading data from SQLite, the extension also allows you to create new SQLite database files, create tables, ingest data into SQLite and make other modifications to SQLite database files using standard SQL queries.
This allows you to use DuckDB to, for example, export data that is stored in a SQLite database to Parquet, or read data from a Parquet file into SQLite.
Below is a brief example of how to create a new SQLite database and load data into it.
ATTACH 'new_sqlite_database.db' AS sqlite_db (TYPE SQLITE);
CREATE TABLE sqlite_db.tbl (id INTEGER, name VARCHAR);
INSERT INTO sqlite_db.tbl VALUES (42, 'DuckDB');
The resulting SQLite database can then be read into from SQLite.
sqlite3 new_sqlite_database.db
SQLite version 3.39.5 2022-10-14 20:58:05
sqlite> SELECT * FROM tbl;
id name
-- ------
42 DuckDB
Many operations on SQLite tables are supported. All these operations directly modify the SQLite database, and the result of subsequent operations can then be read using SQLite.
DuckDB can read or modify a SQLite database while DuckDB or SQLite reads or modifies the same database from a different thread or a separate process. More than one thread or process can read the SQLite database at the same time, but only a single thread or process can write to the database at one time. Database locking is handled by the SQLite library, not DuckDB. Within the same process, SQLite uses mutexes. When accessed from different processes, SQLite uses file system locks. The locking mechanisms also depend on SQLite configuration, like WAL mode. Refer to the SQLite documentation on locking for more information.
Warning Linking multiple copies of the SQLite library into the same application can lead to application errors. See sqlite_scanner Issue #82 for more information.
The extension exposes the following configuration parameters.
Name | Description | Default |
---|---|---|
sqlite_debug_show_queries |
DEBUG SETTING: print all queries sent to SQLite to stdout | false |
Below is a list of supported operations.
CREATE TABLE sqlite_db.tbl (id INTEGER, name VARCHAR);
INSERT INTO sqlite_db.tbl VALUES (42, 'DuckDB');
SELECT * FROM sqlite_db.tbl;
id | name |
---|---|
42 | DuckDB |
COPY sqlite_db.tbl TO 'data.parquet';
COPY sqlite_db.tbl FROM 'data.parquet';
UPDATE sqlite_db.tbl SET name = 'Woohoo' WHERE id = 42;
DELETE FROM sqlite_db.tbl WHERE id = 42;
ALTER TABLE sqlite_db.tbl ADD COLUMN k INTEGER;
DROP TABLE sqlite_db.tbl;
CREATE VIEW sqlite_db.v1 AS SELECT 42;
CREATE TABLE sqlite_db.tmp (i INTEGER);
BEGIN;
INSERT INTO sqlite_db.tmp VALUES (42);
SELECT * FROM sqlite_db.tmp;
i |
---|
42 |
ROLLBACK;
SELECT * FROM sqlite_db.tmp;
i |
---|
Deprecated The old
sqlite_attach
function is deprecated. It is recommended to switch over to the new [ATTACH
syntax]({% link docs/sql/statements/attach.md %}).
The spatial extension integrates the GDAL translator library to read and write spatial data from a variety of geospatial vector file formats. See the documentation for the [st_read
table function]({% link docs/extensions/spatial/functions.md %}#st_read) for how to make use of this in practice.
In order to spare users from having to setup and install additional dependencies on their system, the spatial extension bundles its own copy of the GDAL library. This also means that spatial's version of GDAL may not be the latest version available or provide support for all of the file formats that a system-wide GDAL installation otherwise would. Refer to the section on the [st_drivers
table function]({% link docs/extensions/spatial/functions.md %}#st_drivers) to inspect which GDAL drivers are currently available.
The spatial extension does not only enable importing geospatial file formats (through the ST_Read
function), it also enables exporting DuckDB tables to different geospatial vector formats through a GDAL based COPY
function.
For example, to export a table to a GeoJSON file, with generated bounding boxes, you can use the following query:
COPY ⟨table⟩ TO 'some/file/path/filename.geojson'
WITH (FORMAT GDAL, DRIVER 'GeoJSON', LAYER_CREATION_OPTIONS 'WRITE_BBOX=YES');
Available options:
FORMAT
: is the only required option and must be set toGDAL
to use the GDAL based copy function.DRIVER
: is the GDAL driver to use for the export. UseST_Drivers()
to list the names of all available drivers.LAYER_CREATION_OPTIONS
: list of options to pass to the GDAL driver. See the GDAL docs for the driver you are using for a list of available options.SRS
: Set a spatial reference system as metadata to use for the export. This can be a WKT string, an EPSG code or a proj-string, basically anything you would normally be able to pass to GDAL. Note that this will not perform any reprojection of the input geometry, it just sets the metadata if the target driver supports it.
Note that only vector based drivers are supported by the GDAL integration. Reading and writing raster formats are not supported.
layout: docu title: Excel Extension github_repository: https://github.com/duckdb/duckdb-excel
The excel
extension provides functions to format numbers per Excel's formatting rules by wrapping the i18npool library, but as of DuckDB 1.2 also provides functionality to read and write Excel (.xlsx
) files. However, .xls
files are not supported.
Previously, reading and writing Excel files was handled through the [spatial
extension]({% link docs/extensions/spatial/overview.md %}), which coincidentally included support for XLSX files through one of its dependencies, but this capability may be removed from the spatial extension in the future. Additionally, the excel
extension is more efficient and provides more control over the import/export process. See the [Excel Import]({% link docs/guides/file_formats/excel_import.md %}) and [Excel Export]({% link docs/guides/file_formats/excel_export.md %}) pages for instructions.
The excel
extension will be transparently [autoloaded]({% link docs/extensions/overview.md %}#autoloading-extensions) on first use from the official extension repository.
If you would like to install and load it manually, run:
INSTALL excel;
LOAD excel;
Function | Description |
---|---|
excel_text(number, format_string) |
Format the given number per the rules given in the format_string |
text(number, format_string) |
Alias for excel_text |
SELECT excel_text(1_234_567.897, 'h:mm AM/PM') AS timestamp;
timestamp |
---|
9:31 PM |
SELECT excel_text(1_234_567.897, 'h AM/PM') AS timestamp;
timestamp |
---|
9 PM |
Reading a .xlsx
file is as simple as just SELECT
ing from it immediately, e.g.:
SELECT *
FROM 'test.xlsx';
a | b |
---|---|
1.0 | 2.0 |
3.0 | 4.0 |
However, if you want to set additional options to control the import process, you can use the read_xlsx
function instead. The following named parameters are supported.
Option | Type | Default | Description |
---|---|---|---|
header |
BOOLEAN |
automatically inferred | Whether to treat the first row as containing the names of the resulting columns. |
sheet |
VARCHAR |
automatically inferred | The name of the sheet in the xlsx file to read. Default is the first sheet. |
all_varchar |
BOOLEAN |
false |
Whether to read all cells as containing VARCHAR s. |
ignore_errors |
BOOLEAN |
false |
Whether to ignore errors and silently replace cells that cant be cast to the corresponding inferred column type with NULL 's. |
range |
VARCHAR |
automatically inferred | The range of cells to read, in spreadsheet notation. For example, A1:B2 reads the cells from A1 to B2. If not specified the resulting range will be inferred as rectangular region of cells between the first row of consecutive non-empty cells and the first empty row spanning the same columns. |
stop_at_empty |
BOOLEAN |
automatically inferred | Whether to stop reading the file when an empty row is encountered. If an explicit range option is provided, this is false by default, otherwise true . |
empty_as_varchar |
BOOLEAN |
false |
Whether to treat empty cells as VARCHAR instead of DOUBLE when trying to automatically infer column types. |
SELECT *
FROM read_xlsx('test.xlsx', header = true);
a | b |
---|---|
1.0 | 2.0 |
3.0 | 4.0 |
Alternatively, the COPY
statement with the XLSX
format option can be used to import an Excel file into an existing table, in which case the types of the columns in the target table will be used to coerce the types of the cells in the Excel file.
CREATE TABLE test (a DOUBLE, b DOUBLE);
COPY test FROM 'test.xlsx' WITH (FORMAT 'xlsx', HEADER);
SELECT * FROM test;
Because Excel itself only really stores numbers or strings in cells, and dont enforce that all cells in a column is of the same type, the excel
extension has to do some guesswork to "infer" and decide the types of the columns when importing an Excel sheet. While almost all columns are inferred as either DOUBLE
or VARCHAR
, there are some caveats:
TIMESTAMP
,TIME
,DATE
andBOOLEAN
types are inferred when possible based on the format applied to the cell.- Text cells containing
TRUE
andFALSE
are inferred asBOOLEAN
. - Empty cells are considered to be
DOUBLE
by default, unless theempty_as_varchar
option is set totrue
, in which case they are typed asVARCHAR
.
If the all_varchar
option is set to true
, none of the above applies and all cells are read as VARCHAR
.
When no types are specified explicitly, (e.g., when using the read_xlsx
function instead of COPY TO ... FROM '⟨file⟩.xlsx'
)
the types of the resulting columns are inferred based on the first "data" row in the sheet, that is:
- If no explicit range is given
- The first row after the header if a header is found or forced by the
header
option - The first non-empty row in the sheet if no header is found or forced
- The first row after the header if a header is found or forced by the
- If an explicit range is given
- The second row of the range if a header is found in the first row or forced by the
header
option - The first row of the range if no header is found or forced
- The second row of the range if a header is found in the first row or forced by the
This can sometimes lead to issues if the first "data row" is not representative of the rest of the sheet (e.g., it contains empty cells) in which case the ignore_errors
or empty_as_varchar
options can be used to work around this.
However, when the COPY TO ... FROM '⟨file⟩.xlsx'
syntax is used, no type inference is done and the types of the resulting columns are determined by the types of the columns in the table being copied to. All cells will simply be converted by casting from DOUBLE
or VARCHAR
to the target column type.
Writing .xlsx
files is supported using the COPY
statement with XLSX
given as the format. The following additional parameters are supported.
Option | Type | Default | Description |
---|---|---|---|
header |
BOOLEAN |
false |
Whether to write the column names as the first row in the sheet |
sheet |
VARCHAR |
Sheet1 |
The name of the sheet in the xlsx file to write. |
sheet_row_limit |
INTEGER |
1048576 |
The maximum number of rows in a sheet. An error is thrown if this limit is exceeded. |
Warning Many tools only support a maximum of 1,048,576 rows in a sheet, so increasing the
sheet_row_limit
may render the resulting file unreadable by other software.
These are passed as options to the COPY
statement after the FORMAT
, e.g.:
CREATE TABLE test AS
SELECT *
FROM (VALUES (1, 2), (3, 4)) AS t(a, b);
COPY test TO 'test.xlsx' WITH (FORMAT 'xlsx', HEADER true);
Because XLSX files only really support storing numbers or strings – the equivalent of VARCHAR
and DOUBLE
, the following type conversions are applied when writing XLSX files.
- Numeric types are cast to
DOUBLE
when writing to an XLSX file. - Temporal types (
TIMESTAMP
,DATE
,TIME
, etc.) are converted to excel "serial" numbers, that is the number of days since 1900-01-01 for dates and the fraction of a day for times. These are then styled with a "number format" so that they appear as dates or times when opened in Excel. TIMESTAMP_TZ
andTIME_TZ
are cast to UTCTIMESTAMP
andTIME
respectively, with the timezone information being lost.BOOLEAN
s are converted to1
and0
, with a "number format" applied to make them appear asTRUE
andFALSE
in Excel.- All other types are cast to
VARCHAR
and then written as text cells.
As of DuckDB v1.1.0 the [spatial
extension]({% link docs/extensions/spatial/overview.md %}) provides basic support for spatial indexing through the R-tree extension index type.
When working with geospatial datasets, it is very common that you want to filter rows based on their spatial relationship with a specific region of interest. Unfortunately, even though DuckDB's vectorized execution engine is pretty fast, this sort of operation does not scale very well to large datasets as it always requires a full table scan to check every row in the table. However, by indexing a table with an R-tree, it is possible to accelerate these types of queries significantly.
An R-tree is a balanced tree data structure that stores the approximate minimum bounding rectangle of each geometry (and the internal ID of the corresponding row) in the leaf nodes, and the bounding rectangle enclosing all of the child nodes in each internal node.
The minimum bounding rectangle (MBR) of a geometry is the smallest rectangle that completely encloses the geometry. Usually when we talk about the bounding rectangle of a geometry (or the bounding "box" in the context of 2D geometry), we mean the minimum bounding rectangle. Additionally, we tend to assume that bounding boxes/rectangles are axis-aligned, i.e., the rectangle is not rotated – the sides are always parallel to the coordinate axes. The MBR of a point is the point itself.
By traversing the R-tree from top to bottom, it is possible to very quickly search a R-tree-indexed table for only those rows where the indexed geometry column intersect a specific region of interest, as you can skip searching entire sub-trees if the bounding rectangles of their parent nodes don't intersect the query region at all. Once the leaf nodes are reached, only the specific rows whose geometries intersect the query region have to be fetched from disk, and the often much more expensive exact spatial predicate check (and any other filters) only have to be executed for these rows.
Before you get started using the R-tree index, there are some limitations to be aware of:
- The R-tree index is only supported for the
GEOMETRY
data type. - The R-tree index will only be used to perform "index scans" when the table is filtered (using a
WHERE
clause) with one of the following spatial predicate functions (as they all imply intersection):ST_Equals
,ST_Intersects
,ST_Touches
,ST_Crosses
,ST_Within
,ST_Contains
,ST_Overlaps
,ST_Covers
,ST_CoveredBy
,ST_ContainsProperly
. - One of the arguments to the spatial predicate function must be a “constant” (i.e., a expression whose result is known at query planning time). This is because the query planner needs to know the bounding box of the query region before the query itself is executed in order to use the R-tree index scan.
In the future we want to enable R-tree indexes to be used to accelerate additional predicate functions and more complex queries such a spatial joins.
To create an R-tree index, simply use the CREATE INDEX
statement with the USING RTREE
clause, passing the geometry column to index within the parentheses. For example:
-- Create a table with a geometry column
CREATE TABLE my_table (geom GEOMETRY);
-- Create an R-tree index on the geometry column
CREATE INDEX my_idx ON my_table USING RTREE (geom);
You can also pass in additional options when creating an R-tree index using the WITH
clause to control the behavior of the R-tree index. For example, to specify the maximum number of entries per node in the R-tree, you can use the max_node_capacity
option:
CREATE INDEX my_idx ON my_table USING RTREE (geom) WITH (max_node_capacity = 16);
The impact tweaking these options will have on performance is highly dependent on the system setup DuckDB is running on, the spatial distribution of the dataset, and the query patterns of your specific workload. The defaults should be good enough, but you if you want to experiment with different parameters, see the full list of options here.
Here is an example that shows how to create an R-tree index on a geometry column and where we can see that the RTREE_INDEX_SCAN
operator is used when the table is filtered with a spatial predicate:
INSTALL spatial;
LOAD spatial;
-- Create a table with 10_000_000 random points
CREATE TABLE t1 AS SELECT point::GEOMETRY AS geom
FROM st_generatepoints({min_x: 0, min_y: 0, max_x: 100, max_y: 100}::BOX_2D, 10_000, 1337);
-- Create an index on the table.
CREATE INDEX my_idx ON t1 USING RTREE (geom);
-- Perform a query with a "spatial predicate" on the indexed geometry column
-- Note how the second argument in this case, the ST_MakeEnvelope call is a "constant"
SELECT count(*) FROM t1 WHERE ST_Within(geom, ST_MakeEnvelope(45, 45, 65, 65));
390
We can check for ourselves that an R-tree index scan is used by using the EXPLAIN
statement:
EXPLAIN SELECT count(*) FROM t1 WHERE ST_Within(geom, ST_MakeEnvelope(45, 45, 65, 65));
┌───────────────────────────┐
│ UNGROUPED_AGGREGATE │
│ ──────────────────── │
│ Aggregates: │
│ count_star() │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ FILTER │
│ ──────────────────── │
│ ST_Within(geom, '...') │
│ │
│ ~2000 Rows │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ RTREE_INDEX_SCAN │
│ ──────────────────── │
│ t1 (RTREE INDEX SCAN : │
│ my_idx) │
│ │
│ Projections: geom │
│ │
│ ~10000 Rows │
└───────────────────────────┘
Creating R-trees on top of an already populated table is much faster than first creating the index and then inserting the data. This is because the R-tree will have to periodically rebalance itself and perform a somewhat costly splitting operation when a node reaches max capacity after an insert, potentially causing additional splits to cascade up the tree. However, when the R-tree index is created on an already populated table, a special bottom up "bulk loading algorithm" (Sort-Tile-Recursive) is used, which divides all entries into an already balanced tree as the total number of required nodes can be computed from the beginning.
Additionally, using the bulk loading algorithm tends to create a R-tree with a better structure (less overlap between bounding boxes), which usually leads to better query performance. If you find that the performance of querying the R-tree starts to deteriorate after a large number of updates or deletions, dropping and re-creating the index might produce a higher quality R-tree.
Like DuckDB's built in ART-index, all the associated buffers containing the R-tree will be lazily loaded from disk (when running DuckDB in disk-backed mode), but they are currently never unloaded unless the index is dropped. This means that if you end up scanning the entire index, the entire index will be loaded into memory and stay there for the duration of the database connection. However, all memory used by the R-tree index (even during bulk-loading) is tracked by DuckDB, and will count towards the memory limit set by the memory_limit
configuration parameter.
Depending on you specific workload, you might want to experiment with the max_node_capacity
and min_node_capacity
options to change the structure of the R-tree and how it responds to insertions and deletions, see the full list of options here. In general, a tree with a higher total number of nodes (i.e., a lower max_node_capacity
) may result in a more granular structure that enables more aggressive pruning of sub-trees during query execution, but it will also require more memory to store the tree itself and be more punishing when querying larger regions as more internal nodes will have to be traversed.
The following options can be passed to the WITH
clause when creating an R-tree index: (e.g., CREATE INDEX my_idx ON my_table USING RTREE (geom) WITH (⟨option⟩ = ⟨value⟩);
)
Option | Description | Default |
---|---|---|
max_node_capacity |
The maximum number of entries per node in the R-tree | 128 |
min_node_capacity |
The minimum number of entries per node in the R-tree | 0.4 * max_node_capacity |
*Should a node fall under the minimum number of entries after a deletion, the node will be dissolved and all the entries reinserted from the top of the tree. This is a common operation in R-tree implementations to prevent the tree from becoming too unbalanced.
The rtree_index_dump(VARCHAR)
table function can be used to return all the nodes within an R-tree index which might come on handy when debugging, profiling or otherwise just inspecting the structure of the index. The function takes the name of the R-tree index as an argument and returns a table with the following columns:
Column name | Type | Description |
---|---|---|
level |
INTEGER |
The level of the node in the R-tree. The root node has level 0 |
bounds |
BOX_2DF |
The bounding box of the node |
row_id |
ROW_TYPE |
If this is a leaf node, the rowid of the row in the table, otherwise NULL |
Example:
-- Create a table with 64 random points
CREATE TABLE t1 AS SELECT point::GEOMETRY AS geom
FROM st_generatepoints({min_x: 0, min_y: 0, max_x: 100, max_y: 100}::BOX_2D, 64, 1337);
-- Create an R-tree index on the geometry column (with a low max_node_capacity for demonstration purposes)
CREATE INDEX my_idx ON t1 USING RTREE (geom) WITH (max_node_capacity = 4);
-- Inspect the R-tree index. Notice how the area of the bounding boxes of the branch nodes
-- decreases as we go deeper into the tree.
SELECT
level,
bounds::GEOMETRY AS geom,
CASE WHEN row_id IS NULL THEN st_area(geom) ELSE NULL END AS area,
row_id,
CASE WHEN row_id IS NULL THEN 'branch' ELSE 'leaf' END AS kind
FROM rtree_index_dump('my_idx')
ORDER BY area DESC;
┌───────┬──────────────────────────────┬────────────────────┬────────┬─────────┐
│ level │ geom │ area │ row_id │ kind │
│ int32 │ geometry │ double │ int64 │ varchar │
├───────┼──────────────────────────────┼────────────────────┼────────┼─────────┤
│ 0 │ POLYGON ((2.17285037040710… │ 3286.396482226409 │ │ branch │
│ 0 │ POLYGON ((6.00962591171264… │ 3193.725100864862 │ │ branch │
│ 0 │ POLYGON ((0.74995160102844… │ 3099.921458393704 │ │ branch │
│ 0 │ POLYGON ((14.6168870925903… │ 2322.2760491675654 │ │ branch │
│ 1 │ POLYGON ((2.17285037040710… │ 604.1520104388514 │ │ branch │
│ 1 │ POLYGON ((26.6022186279296… │ 569.1665467030252 │ │ branch │
│ 1 │ POLYGON ((35.7942314147949… │ 435.24662436250037 │ │ branch │
│ 1 │ POLYGON ((62.2643051147460… │ 396.39027683023596 │ │ branch │
│ 1 │ POLYGON ((59.5225715637207… │ 386.09153403820187 │ │ branch │
│ 1 │ POLYGON ((82.3060836791992… │ 369.15115640929434 │ │ branch │
│ · │ · │ · │ · │ · │
│ · │ · │ · │ · │ · │
│ · │ · │ · │ · │ · │
│ 2 │ POLYGON ((20.5411434173584… │ │ 35 │ leaf │
│ 2 │ POLYGON ((14.6168870925903… │ │ 36 │ leaf │
│ 2 │ POLYGON ((43.7271652221679… │ │ 39 │ leaf │
│ 2 │ POLYGON ((53.4629211425781… │ │ 44 │ leaf │
│ 2 │ POLYGON ((26.6022186279296… │ │ 62 │ leaf │
│ 2 │ POLYGON ((53.1732063293457… │ │ 63 │ leaf │
│ 2 │ POLYGON ((78.1427154541015… │ │ 10 │ leaf │
│ 2 │ POLYGON ((75.1728591918945… │ │ 15 │ leaf │
│ 2 │ POLYGON ((62.2643051147460… │ │ 42 │ leaf │
│ 2 │ POLYGON ((80.5032577514648… │ │ 49 │ leaf │
├───────┴──────────────────────────────┴────────────────────┴────────┴─────────┤
│ 84 rows (20 shown) 5 columns │
└──────────────────────────────────────────────────────────────────────────────┘
layout: docu title: TPC-H Extension github_directory: https://github.com/duckdb/duckdb/tree/main/extension/tpch
The tpch
extension implements the data generator and queries for the TPC-H benchmark.
The tpch
extension is shipped by default in some DuckDB builds, otherwise it will be transparently [autoloaded]({% link docs/extensions/overview.md %}#autoloading-extensions) on first use.
If you would like to install and load it manually, run:
INSTALL tpch;
LOAD tpch;
To generate data for scale factor 1, use:
CALL dbgen(sf = 1);
Calling dbgen
does not clean up existing TPC-H tables.
To clean up existing tables, use DROP TABLE
before running dbgen
:
DROP TABLE IF EXISTS customer;
DROP TABLE IF EXISTS lineitem;
DROP TABLE IF EXISTS nation;
DROP TABLE IF EXISTS orders;
DROP TABLE IF EXISTS part;
DROP TABLE IF EXISTS partsupp;
DROP TABLE IF EXISTS region;
DROP TABLE IF EXISTS supplier;
To run a query, e.g., query 4, use:
PRAGMA tpch(4);
o_orderpriority | order_count |
---|---|
1-URGENT | 10594 |
2-HIGH | 10476 |
3-MEDIUM | 10410 |
4-NOT SPECIFIED | 10556 |
5-LOW | 10487 |
To list all 22 queries, run:
FROM tpch_queries();
This function returns a table with columns query_nr
and query
.
To produced the expected results for all queries on scale factors 0.01, 0.1, and 1, run:
FROM tpch_answers();
This function returns a table with columns query_nr
, scale_factor
, and answer
.
It's possible to generate the schema of TPC-H without any data by setting the scale factor to 0:
CALL dbgen(sf = 0);
The data generator function dbgen
has the following parameters:
Name | Type | Description |
---|---|---|
catalog |
VARCHAR |
Target catalog |
children |
UINTEGER |
Number of partitions |
overwrite |
BOOLEAN |
(Not used) |
sf |
DOUBLE |
Scale factor |
step |
UINTEGER |
Defines the partition to be generated, indexed from 0 to children - 1. Must be defined when the children arguments is defined |
suffix |
VARCHAR |
Append the suffix to table names |
Pre-generated DuckDB databases for TPC-H are available for download:
tpch-sf1.db
(250 MB)tpch-sf3.db
(754 MB)tpch-sf10.db
(2.5 GB)tpch-sf30.db
(7.6 GB)tpch-sf100.db
(26 GB)tpch-sf300.db
(78 GB)tpch-sf1000.db
(265 GB)tpch-sf3000.db
(796 GB)
Generating TPC-H data sets for large scale factors takes a significant amount of time. Additionally, when the generation is done in a single step, it requires a large amount of memory. The following table gives an estimate on the resources required to produce DuckDB database files containing the generated TPC-H data set using 128 threads.
Scale factor | Database size | Data generation time | Generator's memory usage |
---|---|---|---|
100 | 26 GB | 17 minutes | 71 GB |
300 | 78 GB | 51 minutes | 211 GB |
1000 | 265 GB | 2h 53 minutes | 647 GB |
3000 | 796 GB | 8h 30 minutes | 1799 GB |
The numbers shown above were achieved by running the dbgen
function in a single step, for example:
CALL dbgen(sf = 300);
If you have a limited amount of memory available, you can run the dbgen
function in steps.
For example, you may generate SF300 in 10 steps:
CALL dbgen(sf = 300, children = 10, step = 0);
CALL dbgen(sf = 300, children = 10, step = 1);
...
CALL dbgen(sf = 300, children = 10, step = 9);
The tpch(⟨query_id⟩)
function runs a fixed TPC-H query with pre-defined bind parameters (a.k.a. substitution parameters). It is not possible to change the query parameters using the tpch
extension. To run the queries with the parameters prescribed by the TPC-H benchmark, use a TPC-H framework implementation.
layout: docu title: Python Client API redirect_from:
- /docs/api/python/reference/index
- /docs/api/python/reference/index/
- duckdb.threadsafety bool¶
-
Indicates that this package is threadsafe
- duckdb.apilevel int¶
-
Indicates which Python DBAPI version this package implements
- duckdb.paramstyle str¶
-
Indicates which parameter style duckdb supports
- exception duckdb.BinderException¶
-
Bases:
ProgrammingError
- duckdb.CaseExpression(condition: duckdb.duckdb.Expression, value: duckdb.duckdb.Expression) → duckdb.duckdb.Expression¶
- exception duckdb.CatalogException¶
-
Bases:
ProgrammingError
- duckdb.CoalesceOperator(*args) → duckdb.duckdb.Expression¶
- duckdb.ColumnExpression(*args) → duckdb.duckdb.Expression¶
-
Create a column reference from the provided column name
- exception duckdb.ConnectionException¶
-
Bases:
OperationalError
- duckdb.ConstantExpression(value: object) → duckdb.duckdb.Expression¶
-
Create a constant expression from the provided value
- exception duckdb.ConstraintException¶
-
Bases:
IntegrityError
- exception duckdb.DataError¶
-
Bases:
DatabaseError
- duckdb.DefaultExpression() → duckdb.duckdb.Expression¶
- class duckdb.DuckDBPyConnection¶
-
Bases:
pybind11_object
- append(self: duckdb.duckdb.DuckDBPyConnection, table_name: str, df: pandas.DataFrame, *, by_name: bool = False) → duckdb.duckdb.DuckDBPyConnection¶
-
Append the passed DataFrame to the named table
- array_type(self: duckdb.duckdb.DuckDBPyConnection, type: duckdb.duckdb.typing.DuckDBPyType, size: int) → duckdb.duckdb.typing.DuckDBPyType¶
-
Create an array type object of ‘type’
- arrow(self: duckdb.duckdb.DuckDBPyConnection, rows_per_batch: int = 1000000) → pyarrow.lib.Table¶
-
Fetch a result as Arrow table following execute()
- begin(self: duckdb.duckdb.DuckDBPyConnection) → duckdb.duckdb.DuckDBPyConnection¶
-
Start a new transaction
- checkpoint(self: duckdb.duckdb.DuckDBPyConnection) → duckdb.duckdb.DuckDBPyConnection¶
-
Synchronizes data in the write-ahead log (WAL) to the database data file (no-op for in-memory connections)
- close(self: duckdb.duckdb.DuckDBPyConnection) → None¶
-
Close the connection
- commit(self: duckdb.duckdb.DuckDBPyConnection) → duckdb.duckdb.DuckDBPyConnection¶
-
Commit changes performed within a transaction
- create_function(self: duckdb.duckdb.DuckDBPyConnection, name: str, function: Callable, parameters: object = None, return_type: duckdb.duckdb.typing.DuckDBPyType = None, *, type: duckdb.duckdb.functional.PythonUDFType = <PythonUDFType.NATIVE: 0>, null_handling: duckdb.duckdb.functional.FunctionNullHandling = <FunctionNullHandling.DEFAULT: 0>, exception_handling: duckdb.duckdb.PythonExceptionHandling = <PythonExceptionHandling.DEFAULT: 0>, side_effects: bool = False) → duckdb.duckdb.DuckDBPyConnection¶
-
Create a DuckDB function out of the passing in Python function so it can be used in queries
- cursor(self: duckdb.duckdb.DuckDBPyConnection) → duckdb.duckdb.DuckDBPyConnection¶
-
Create a duplicate of the current connection
- decimal_type(self: duckdb.duckdb.DuckDBPyConnection, width: int, scale: int) → duckdb.duckdb.typing.DuckDBPyType¶
-
Create a decimal type with ‘width’ and ‘scale’
- property description¶
-
Get result set attributes, mainly column names
- df(self: duckdb.duckdb.DuckDBPyConnection, *, date_as_object: bool = False) → pandas.DataFrame¶
-
Fetch a result as DataFrame following execute()
- dtype(self: duckdb.duckdb.DuckDBPyConnection, type_str: str) → duckdb.duckdb.typing.DuckDBPyType¶
-
Create a type object by parsing the ‘type_str’ string
- duplicate(self: duckdb.duckdb.DuckDBPyConnection) → duckdb.duckdb.DuckDBPyConnection¶
-
Create a duplicate of the current connection
- enum_type(self: duckdb.duckdb.DuckDBPyConnection, name: str, type: duckdb.duckdb.typing.DuckDBPyType, values: list) → duckdb.duckdb.typing.DuckDBPyType¶
-
Create an enum type of underlying ‘type’, consisting of the list of ‘values’
- execute(self: duckdb.duckdb.DuckDBPyConnection, query: object, parameters: object = None) → duckdb.duckdb.DuckDBPyConnection¶
-
Execute the given SQL query, optionally using prepared statements with parameters set
- executemany(self: duckdb.duckdb.DuckDBPyConnection, query: object, parameters: object = None) → duckdb.duckdb.DuckDBPyConnection¶
-
Execute the given prepared statement multiple times using the list of parameter sets in parameters
- extract_statements(self: duckdb.duckdb.DuckDBPyConnection, query: str) → list¶
-
Parse the query string and extract the Statement object(s) produced
- fetch_arrow_table(self: duckdb.duckdb.DuckDBPyConnection, rows_per_batch: int = 1000000) → pyarrow.lib.Table¶
-
Fetch a result as Arrow table following execute()
- fetch_df(self: duckdb.duckdb.DuckDBPyConnection, *, date_as_object: bool = False) → pandas.DataFrame¶
-
Fetch a result as DataFrame following execute()
- fetch_df_chunk(self: duckdb.duckdb.DuckDBPyConnection, vectors_per_chunk: int = 1, *, date_as_object: bool = False) → pandas.DataFrame¶
-
Fetch a chunk of the result as DataFrame following execute()
- fetch_record_batch(self: duckdb.duckdb.DuckDBPyConnection, rows_per_batch: int = 1000000) → pyarrow.lib.RecordBatchReader¶
-
Fetch an Arrow RecordBatchReader following execute()
- fetchall(self: duckdb.duckdb.DuckDBPyConnection) → list¶
-
Fetch all rows from a result following execute
- fetchdf(self: duckdb.duckdb.DuckDBPyConnection, *, date_as_object: bool = False) → pandas.DataFrame¶
-
Fetch a result as DataFrame following execute()
- fetchmany(self: duckdb.duckdb.DuckDBPyConnection, size: int = 1) → list¶
-
Fetch the next set of rows from a result following execute
- fetchnumpy(self: duckdb.duckdb.DuckDBPyConnection) → dict¶
-
Fetch a result as list of NumPy arrays following execute
- fetchone(self: duckdb.duckdb.DuckDBPyConnection) → Optional[tuple]¶
-
Fetch a single row from a result following execute
- filesystem_is_registered(self: duckdb.duckdb.DuckDBPyConnection, name: str) → bool¶
-
Check if a filesystem with the provided name is currently registered
- from_arrow(self: duckdb.duckdb.DuckDBPyConnection, arrow_object: object) → duckdb.duckdb.DuckDBPyRelation¶
-
Create a relation object from an Arrow object
- from_csv_auto(self: duckdb.duckdb.DuckDBPyConnection, path_or_buffer: object, **kwargs) → duckdb.duckdb.DuckDBPyRelation¶
-
Create a relation object from the CSV file in ‘name’
- from_df(self: duckdb.duckdb.DuckDBPyConnection, df: pandas.DataFrame) → duckdb.duckdb.DuckDBPyRelation¶
-
Create a relation object from the DataFrame in df
- from_parquet(*args, **kwargs)¶
-
Overloaded function.
from_parquet(self: duckdb.duckdb.DuckDBPyConnection, file_glob: str, binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -> duckdb.duckdb.DuckDBPyRelation
Create a relation object from the Parquet files in file_glob
from_parquet(self: duckdb.duckdb.DuckDBPyConnection, file_globs: list[str], binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -> duckdb.duckdb.DuckDBPyRelation
Create a relation object from the Parquet files in file_globs
- from_query(self: duckdb.duckdb.DuckDBPyConnection, query: object, *, alias: str = '', params: object = None) → duckdb.duckdb.DuckDBPyRelation¶
-
Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is.
- get_table_names(self: duckdb.duckdb.DuckDBPyConnection, query: str) → set[str]¶
-
Extract the required table names from a query
- install_extension(self: duckdb.duckdb.DuckDBPyConnection, extension: str, *, force_install: bool = False, repository: object = None, repository_url: object = None, version: object = None) → None¶
-
Install an extension by name, with an optional version and/or repository to get the extension from
- interrupt(self: duckdb.duckdb.DuckDBPyConnection) → None¶
-
Interrupt pending operations
- list_filesystems(self: duckdb.duckdb.DuckDBPyConnection) → list¶
-
List registered filesystems, including builtin ones
- list_type(self: duckdb.duckdb.DuckDBPyConnection, type: duckdb.duckdb.typing.DuckDBPyType) → duckdb.duckdb.typing.DuckDBPyType¶
-
Create a list type object of ‘type’
- load_extension(self: duckdb.duckdb.DuckDBPyConnection, extension: str) → None¶
-
Load an installed extension
- map_type(self: duckdb.duckdb.DuckDBPyConnection, key: duckdb.duckdb.typing.DuckDBPyType, value: duckdb.duckdb.typing.DuckDBPyType) → duckdb.duckdb.typing.DuckDBPyType¶
-
Create a map type object from ‘key_type’ and ‘value_type’
- pl(self: duckdb.duckdb.DuckDBPyConnection, rows_per_batch: int = 1000000) → duckdb::PolarsDataFrame¶
-
Fetch a result as Polars DataFrame following execute()
- query(self: duckdb.duckdb.DuckDBPyConnection, query: object, *, alias: str = '', params: object = None) → duckdb.duckdb.DuckDBPyRelation¶
-
Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is.
- read_csv(self: duckdb.duckdb.DuckDBPyConnection, path_or_buffer: object, **kwargs) → duckdb.duckdb.DuckDBPyRelation¶
-
Create a relation object from the CSV file in ‘name’
- read_json(self: duckdb.duckdb.DuckDBPyConnection, path_or_buffer: object, *, columns: Optional[object] = None, sample_size: Optional[object] = None, maximum_depth: Optional[object] = None, records: Optional[str] = None, format: Optional[str] = None, date_format: Optional[object] = None, timestamp_format: Optional[object] = None, compression: Optional[object] = None, maximum_object_size: Optional[object] = None, ignore_errors: Optional[object] = None, convert_strings_to_integers: Optional[object] = None, field_appearance_threshold: Optional[object] = None, map_inference_threshold: Optional[object] = None, maximum_sample_files: Optional[object] = None, filename: Optional[object] = None, hive_partitioning: Optional[object] = None, union_by_name: Optional[object] = None, hive_types: Optional[object] = None, hive_types_autocast: Optional[object] = None) → duckdb.duckdb.DuckDBPyRelation¶
-
Create a relation object from the JSON file in ‘name’
- read_parquet(*args, **kwargs)¶
-
Overloaded function.
read_parquet(self: duckdb.duckdb.DuckDBPyConnection, file_glob: str, binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -> duckdb.duckdb.DuckDBPyRelation
Create a relation object from the Parquet files in file_glob
read_parquet(self: duckdb.duckdb.DuckDBPyConnection, file_globs: list[str], binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -> duckdb.duckdb.DuckDBPyRelation
Create a relation object from the Parquet files in file_globs
- register(self: duckdb.duckdb.DuckDBPyConnection, view_name: str, python_object: object) → duckdb.duckdb.DuckDBPyConnection¶
-
Register the passed Python Object value for querying with a view
- register_filesystem(self: duckdb.duckdb.DuckDBPyConnection, filesystem: fsspec.AbstractFileSystem) → None¶
-
Register a fsspec compliant filesystem
- remove_function(self: duckdb.duckdb.DuckDBPyConnection, name: str) → duckdb.duckdb.DuckDBPyConnection¶
-
Remove a previously created function
- rollback(self: duckdb.duckdb.DuckDBPyConnection) → duckdb.duckdb.DuckDBPyConnection¶
-
Roll back changes performed within a transaction
- row_type(self: duckdb.duckdb.DuckDBPyConnection, fields: object) → duckdb.duckdb.typing.DuckDBPyType¶
-
Create a struct type object from ‘fields’
- property rowcount¶
-
Get result set row count
- sql(self: duckdb.duckdb.DuckDBPyConnection, query: object, *, alias: str = '', params: object = None) → duckdb.duckdb.DuckDBPyRelation¶
-
Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is.
- sqltype(self: duckdb.duckdb.DuckDBPyConnection, type_str: str) → duckdb.duckdb.typing.DuckDBPyType¶
-
Create a type object by parsing the ‘type_str’ string
- string_type(self: duckdb.duckdb.DuckDBPyConnection, collation: str = '') → duckdb.duckdb.typing.DuckDBPyType¶
-
Create a string type with an optional collation
- struct_type(self: duckdb.duckdb.DuckDBPyConnection, fields: object) → duckdb.duckdb.typing.DuckDBPyType¶
-
Create a struct type object from ‘fields’
- table(self: duckdb.duckdb.DuckDBPyConnection, table_name: str) → duckdb.duckdb.DuckDBPyRelation¶
-
Create a relation object for the named table
- table_function(self: duckdb.duckdb.DuckDBPyConnection, name: str, parameters: object = None) → duckdb.duckdb.DuckDBPyRelation¶
-
Create a relation object from the named table function with given parameters
- tf(self: duckdb.duckdb.DuckDBPyConnection) → dict¶
-
Fetch a result as dict of TensorFlow Tensors following execute()
- torch(self: duckdb.duckdb.DuckDBPyConnection) → dict¶
-
Fetch a result as dict of PyTorch Tensors following execute()
- type(self: duckdb.duckdb.DuckDBPyConnection, type_str: str) → duckdb.duckdb.typing.DuckDBPyType¶
-
Create a type object by parsing the ‘type_str’ string
- union_type(self: duckdb.duckdb.DuckDBPyConnection, members: object) → duckdb.duckdb.typing.DuckDBPyType¶
-
Create a union type object from ‘members’
- unregister(self: duckdb.duckdb.DuckDBPyConnection, view_name: str) → duckdb.duckdb.DuckDBPyConnection¶
-
Unregister the view name
- unregister_filesystem(self: duckdb.duckdb.DuckDBPyConnection, name: str) → None¶
-
Unregister a filesystem
- values(self: duckdb.duckdb.DuckDBPyConnection, *args) → duckdb.duckdb.DuckDBPyRelation¶
-
Create a relation object from the passed values
- view(self: duckdb.duckdb.DuckDBPyConnection, view_name: str) → duckdb.duckdb.DuckDBPyRelation¶
-
Create a relation object for the named view
- class duckdb.DuckDBPyRelation¶
-
Bases:
pybind11_object
- aggregate(self: duckdb.duckdb.DuckDBPyRelation, aggr_expr: object, group_expr: str = '') → duckdb.duckdb.DuckDBPyRelation¶
-
Compute the aggregate aggr_expr by the optional groups group_expr on the relation
- property alias¶
-
Get the name of the current alias
- any_value(self: duckdb.duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') → duckdb.duckdb.DuckDBPyRelation¶
-
Returns the first non-null value from a given column
- apply(self: duckdb.duckdb.DuckDBPyRelation, function_name: str, function_aggr: str, group_expr: str = '', function_parameter: str = '', projected_columns: str = '') → duckdb.duckdb.DuckDBPyRelation¶
-
Compute the function of a single column or a list of columns by the optional groups on the relation
- arg_max(self: duckdb.duckdb.DuckDBPyRelation, arg_column: str, value_column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') → duckdb.duckdb.DuckDBPyRelation¶
-
Finds the row with the maximum value for a value column and returns the value of that row for an argument column
- arg_min(self: duckdb.duckdb.DuckDBPyRelation, arg_column: str, value_column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') → duckdb.duckdb.DuckDBPyRelation¶
-
Finds the row with the minimum value for a value column and returns the value of that row for an argument column
- arrow(self: duckdb.duckdb.DuckDBPyRelation, batch_size: int = 1000000) → pyarrow.lib.Table¶
-
Execute and fetch all rows as an Arrow Table
- avg(self: duckdb.duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') → duckdb.duckdb.DuckDBPyRelation¶
-
Computes the average on a given column
- bit_and(self: duckdb.duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') → duckdb.duckdb.DuckDBPyRelation¶
-
Computes the bitwise AND of all bits present in a given column
- bit_or(self:<