Skip to content

Instantly share code, notes, and snippets.

@szarnyasg
Created February 23, 2025 22:20
Show Gist options
  • Save szarnyasg/f428de759db9f371ca688b6b8e4016d8 to your computer and use it in GitHub Desktop.
Save szarnyasg/f428de759db9f371ca688b6b8e4016d8 to your computer and use it in GitHub Desktop.
duckdb docs 2025-02-23
layout title github_repository redirect_from
docu
Swift Client
/docs/api/swift
/docs/api/swift/

DuckDB has a Swift client. See the [announcement post]({% post_url 2023-04-21-swift %}) for details.

Instantiating DuckDB

DuckDB supports both in-memory and persistent databases. To work with an in-memory datatabase, run:

let database = try Database(store: .inMemory)

To work with a persistent database, run:

let database = try Database(store: .file(at: "test.db"))

Queries can be issued through a database connection.

let connection = try database.connect()

DuckDB supports multiple connections per database.

Application Example

The rest of the page is based on the example of our [announcement post]({% post_url 2023-04-21-swift %}), which uses raw data from NASA's Exoplanet Archive loaded directly into DuckDB.

Creating an Application-Specific Type

We first create an application-specific type that we'll use to house our database and connection and through which we'll eventually define our app-specific queries.

import DuckDB

final class ExoplanetStore {

  let database: Database
  let connection: Connection

  init(database: Database, connection: Connection) {
    self.database = database
    self.connection = connection
  }
}

Loading a CSV File

We load the data from NASA's Exoplanet Archive:

wget https://exoplanetarchive.ipac.caltech.edu/TAP/sync?query=select+pl_name+,+disc_year+from+pscomppars&format=csv -O downloaded_exoplanets.csv

Once we have our CSV downloaded locally, we can use the following SQL command to load it as a new table to DuckDB:

CREATE TABLE exoplanets AS
    SELECT * FROM read_csv('downloaded_exoplanets.csv');

Let's package this up as a new asynchronous factory method on our ExoplanetStore type:

import DuckDB
import Foundation

final class ExoplanetStore {

  // Factory method to create and prepare a new ExoplanetStore
  static func create() async throws -> ExoplanetStore {

  // Create our database and connection as described above
    let database = try Database(store: .inMemory)
    let connection = try database.connect()

  // Download the CSV from the exoplanet archive
  let (csvFileURL, _) = try await URLSession.shared.download(
    from: URL(string: "https://exoplanetarchive.ipac.caltech.edu/TAP/sync?query=select+pl_name+,+disc_year+from+pscomppars&format=csv")!)

  // Issue our first query to DuckDB
  try connection.execute("""
      CREATE TABLE exoplanets AS
          SELECT * FROM read_csv('\(csvFileURL.path)');
  """)

  // Create our pre-populated ExoplanetStore instance
    return ExoplanetStore(
    database: database,
      connection: connection
  )
  }

  // Let's make the initializer we defined previously
  // private. This prevents anyone accidentally instantiating
  // the store without having pre-loaded our Exoplanet CSV
  // into the database
  private init(database: Database, connection: Connection) {
  ...
  }
}

Querying the Database

The following example queires DuckDB from within Swift via an async function. This means the callee won't be blocked while the query is executing. We'll then cast the result columns to Swift native types using DuckDB's ResultSet cast(to:) family of methods, before finally wrapping them up in a DataFrame from the TabularData framework.

...

import TabularData

extension ExoplanetStore {

  // Retrieves the number of exoplanets discovered by year
  func groupedByDiscoveryYear() async throws -> DataFrame {

  // Issue the query we described above
    let result = try connection.query("""
      SELECT disc_year, count(disc_year) AS Count
        FROM exoplanets
        GROUP BY disc_year
        ORDER BY disc_year
      """)

    // Cast our DuckDB columns to their native Swift
    // equivalent types
    let discoveryYearColumn = result[0].cast(to: Int.self)
    let countColumn = result[1].cast(to: Int.self)

    // Use our DuckDB columns to instantiate TabularData
    // columns and populate a TabularData DataFrame
    return DataFrame(columns: [
      TabularData.Column(discoveryYearColumn).eraseToAnyColumn(),
      TabularData.Column(countColumn).eraseToAnyColumn(),
    ])
  }
}

Complete Project

For the complete example project, clone the DuckDB Swift repository and open up the runnable app project located in Examples/SwiftUI/ExoplanetExplorer.xcodeproj.

layout: docu title: Node.js API redirect_from:

  • /docs/api/nodejs
  • /docs/api/nodejs/
  • /docs/api/nodejs/overview
  • /docs/api/nodejs/overview/

Deprecated The old DuckDB Node.js package is deprecated. Please use the [DuckDB Node Neo package]({% link docs/clients/node_neo/overview.md %}) instead.

This package provides a Node.js API for DuckDB. The API for this client is somewhat compliant to the SQLite Node.js client for easier transition.

Initializing

Load the package and create a database object:

const duckdb = require('duckdb');
const db = new duckdb.Database(':memory:'); // or a file name for a persistent DB

All options as described on [Database configuration]({% link docs/configuration/overview.md %}#configuration-reference) can be (optionally) supplied to the Database constructor as second argument. The third argument can be optionally supplied to get feedback on the given options.

const db = new duckdb.Database(':memory:', {
    "access_mode": "READ_WRITE",
    "max_memory": "512MB",
    "threads": "4"
}, (err) => {
  if (err) {
    console.error(err);
  }
});

Running a Query

The following code snippet runs a simple query using the Database.all() method.

db.all('SELECT 42 AS fortytwo', function(err, res) {
  if (err) {
    console.warn(err);
    return;
  }
  console.log(res[0].fortytwo)
});

Other available methods are each, where the callback is invoked for each row, run to execute a single statement without results and exec, which can execute several SQL commands at once but also does not return results. All those commands can work with prepared statements, taking the values for the parameters as additional arguments. For example like so:

db.all('SELECT ?::INTEGER AS fortytwo, ?::VARCHAR AS hello', 42, 'Hello, World', function(err, res) {
  if (err) {
    console.warn(err);
    return;
  }
  console.log(res[0].fortytwo)
  console.log(res[0].hello)
});

Connections

A database can have multiple Connections, those are created using db.connect().

const con = db.connect();

You can create multiple connections, each with their own transaction context.

Connection objects also contain shorthands to directly call run(), all() and each() with parameters and callbacks, respectively, for example:

con.all('SELECT 42 AS fortytwo', function(err, res) {
  if (err) {
    console.warn(err);
    return;
  }
  console.log(res[0].fortytwo)
});

Prepared Statements

From connections, you can create prepared statements (and only that) using con.prepare():

const stmt = con.prepare('SELECT ?::INTEGER AS fortytwo');

To execute this statement, you can call for example all() on the stmt object:

stmt.all(42, function(err, res) {
  if (err) {
    console.warn(err);
  } else {
    console.log(res[0].fortytwo)
  }
});

You can also execute the prepared statement multiple times. This is for example useful to fill a table with data:

con.run('CREATE TABLE a (i INTEGER)');
const stmt = con.prepare('INSERT INTO a VALUES (?)');
for (let i = 0; i < 10; i++) {
  stmt.run(i);
}
stmt.finalize();
con.all('SELECT * FROM a', function(err, res) {
  if (err) {
    console.warn(err);
  } else {
    console.log(res)
  }
});

prepare() can also take a callback which gets the prepared statement as an argument:

const stmt = con.prepare('SELECT ?::INTEGER AS fortytwo', function(err, stmt) {
  stmt.all(42, function(err, res) {
    if (err) {
      console.warn(err);
    } else {
      console.log(res[0].fortytwo)
    }
  });
});

Inserting Data via Arrow

[Apache Arrow]({% link docs/guides/python/sql_on_arrow.md %}) can be used to insert data into DuckDB without making a copy:

const arrow = require('apache-arrow');
const db = new duckdb.Database(':memory:');

const jsonData = [
  {"userId":1,"id":1,"title":"delectus aut autem","completed":false},
  {"userId":1,"id":2,"title":"quis ut nam facilis et officia qui","completed":false}
];

// note; doesn't work on Windows yet
db.exec(`INSTALL arrow; LOAD arrow;`, (err) => {
    if (err) {
        console.warn(err);
        return;
    }

    const arrowTable = arrow.tableFromJSON(jsonData);
    db.register_buffer("jsonDataTable", [arrow.tableToIPC(arrowTable)], true, (err, res) => {
        if (err) {
            console.warn(err);
            return;
        }

        // `SELECT * FROM jsonDataTable` would return the entries in `jsonData`
    });
});

Loading Unsigned Extensions

To load [unsigned extensions]({% link docs/extensions/overview.md %}#unsigned-extensions), instantiate the database as follows:

db = new duckdb.Database(':memory:', {"allow_unsigned_extensions": "true"});

layout: docu title: Node.js API redirect_from:

  • /docs/api/nodejs/reference
  • /docs/api/nodejs/reference/

Modules

duckdb

Typedefs

ColumnInfo : object
TypeInfo : object
DuckDbError : object
HTTPError : object

duckdb

Summary: DuckDB is an embeddable SQL OLAP Database Management System

duckdb~Connection

Kind: inner class of duckdb

connection.run(sql, ...params, callback) ⇒ void

Run a SQL statement and trigger a callback when done

Kind: instance method of Connection

Param Type
sql
...params *
callback

connection.all(sql, ...params, callback) ⇒ void

Run a SQL query and triggers the callback once for all result rows

Kind: instance method of Connection

Param Type
sql
...params *
callback

connection.arrowIPCAll(sql, ...params, callback) ⇒ void

Run a SQL query and serialize the result into the Apache Arrow IPC format (requires arrow extension to be loaded)

Kind: instance method of Connection

Param Type
sql
...params *
callback

connection.arrowIPCStream(sql, ...params, callback) ⇒

Run a SQL query, returns a IpcResultStreamIterator that allows streaming the result into the Apache Arrow IPC format (requires arrow extension to be loaded)

Kind: instance method of Connection
Returns: Promise

Param Type
sql
...params *
callback

connection.each(sql, ...params, callback) ⇒ void

Runs a SQL query and triggers the callback for each result row

Kind: instance method of Connection

Param Type
sql
...params *
callback

connection.stream(sql, ...params)

Kind: instance method of Connection

Param Type
sql
...params *

connection.register_udf(name, return_type, fun) ⇒ void

Register a User Defined Function

Kind: instance method of Connection
Note: this follows the wasm udfs somewhat but is simpler because we can pass data much more cleanly

Param
name
return_type
fun

connection.prepare(sql, ...params, callback) ⇒ Statement

Prepare a SQL query for execution

Kind: instance method of Connection

Param Type
sql
...params *
callback

connection.exec(sql, ...params, callback) ⇒ void

Execute a SQL query

Kind: instance method of Connection

Param Type
sql
...params *
callback

connection.register_udf_bulk(name, return_type, callback) ⇒ void

Register a User Defined Function

Kind: instance method of Connection

Param
name
return_type
callback

connection.unregister_udf(name, return_type, callback) ⇒ void

Unregister a User Defined Function

Kind: instance method of Connection

Param
name
return_type
callback

connection.register_buffer(name, array, force, callback) ⇒ void

Register a Buffer to be scanned using the Apache Arrow IPC scanner (requires arrow extension to be loaded)

Kind: instance method of Connection

Param
name
array
force
callback

connection.unregister_buffer(name, callback) ⇒ void

Unregister the Buffer

Kind: instance method of Connection

Param
name
callback

connection.close(callback) ⇒ void

Closes connection

Kind: instance method of Connection

Param
callback

duckdb~Statement

Kind: inner class of duckdb

statement.sql ⇒

Kind: instance property of Statement
Returns: sql contained in statement
Field:

statement.get()

Not implemented

Kind: instance method of Statement

statement.run(sql, ...params, callback) ⇒ void

Kind: instance method of Statement

Param Type
sql
...params *
callback

statement.all(sql, ...params, callback) ⇒ void

Kind: instance method of Statement

Param Type
sql
...params *
callback

statement.arrowIPCAll(sql, ...params, callback) ⇒ void

Kind: instance method of Statement

Param Type
sql
...params *
callback

statement.each(sql, ...params, callback) ⇒ void

Kind: instance method of Statement

Param Type
sql
...params *
callback

statement.finalize(sql, ...params, callback) ⇒ void

Kind: instance method of Statement

Param Type
sql
...params *
callback

statement.stream(sql, ...params)

Kind: instance method of Statement

Param Type
sql
...params *

statement.columns() ⇒ Array.<ColumnInfo>

Kind: instance method of Statement
Returns: Array.<ColumnInfo> - - Array of column names and types

duckdb~QueryResult

Kind: inner class of duckdb

queryResult.nextChunk() ⇒

Kind: instance method of QueryResult
Returns: data chunk

queryResult.nextIpcBuffer() ⇒

Function to fetch the next result blob of an Arrow IPC Stream in a zero-copy way. (requires arrow extension to be loaded)

Kind: instance method of QueryResult
Returns: data chunk

queryResult.asyncIterator()

Kind: instance method of QueryResult

duckdb~Database

Main database interface

Kind: inner property of duckdb

Param Description
path path to database file or :memory: for in-memory database
access_mode access mode
config the configuration object
callback callback function

database.close(callback) ⇒ void

Closes database instance

Kind: instance method of Database

Param
callback

database.close_internal(callback) ⇒ void

Internal method. Do not use, call Connection#close instead

Kind: instance method of Database

Param
callback

database.wait(callback) ⇒ void

Triggers callback when all scheduled database tasks have completed.

Kind: instance method of Database

Param
callback

database.serialize(callback) ⇒ void

Currently a no-op. Provided for SQLite compatibility

Kind: instance method of Database

Param
callback

database.parallelize(callback) ⇒ void

Currently a no-op. Provided for SQLite compatibility

Kind: instance method of Database

Param
callback

database.connect(path) ⇒ Connection

Create a new database connection

Kind: instance method of Database

Param Description
path the database to connect to, either a file path, or :memory:

database.interrupt(callback) ⇒ void

Supposedly interrupt queries, but currently does not do anything.

Kind: instance method of Database

Param
callback

database.prepare(sql) ⇒ Statement

Prepare a SQL query for execution

Kind: instance method of Database

Param
sql

database.run(sql, ...params, callback) ⇒ void

Convenience method for Connection#run using a built-in default connection

Kind: instance method of Database

Param Type
sql
...params *
callback

database.scanArrowIpc(sql, ...params, callback) ⇒ void

Convenience method for Connection#scanArrowIpc using a built-in default connection

Kind: instance method of Database

Param Type
sql
...params *
callback

database.each(sql, ...params, callback) ⇒ void

Kind: instance method of Database

Param Type
sql
...params *
callback

database.stream(sql, ...params)

Kind: instance method of Database

Param Type
sql
...params *

database.all(sql, ...params, callback) ⇒ void

Convenience method for Connection#apply using a built-in default connection

Kind: instance method of Database

Param Type
sql
...params *
callback

database.arrowIPCAll(sql, ...params, callback) ⇒ void

Convenience method for Connection#arrowIPCAll using a built-in default connection

Kind: instance method of Database

Param Type
sql
...params *
callback

database.arrowIPCStream(sql, ...params, callback) ⇒ void

Convenience method for Connection#arrowIPCStream using a built-in default connection

Kind: instance method of Database

Param Type
sql
...params *
callback

database.exec(sql, ...params, callback) ⇒ void

Kind: instance method of Database

Param Type
sql
...params *
callback

database.register_udf(name, return_type, fun) ⇒ this

Register a User Defined Function

Convenience method for Connection#register_udf

Kind: instance method of Database

Param
name
return_type
fun

database.register_buffer(name) ⇒ this

Register a buffer containing serialized data to be scanned from DuckDB.

Convenience method for Connection#unregister_buffer

Kind: instance method of Database

Param
name

database.unregister_buffer(name) ⇒ this

Unregister a Buffer

Convenience method for Connection#unregister_buffer

Kind: instance method of Database

Param
name

database.unregister_udf(name) ⇒ this

Unregister a UDF

Convenience method for Connection#unregister_udf

Kind: instance method of Database

Param
name

database.registerReplacementScan(fun) ⇒ this

Register a table replace scan function

Kind: instance method of Database

Param Description
fun Replacement scan function

database.tokenize(text) ⇒ ScriptTokens

Return positions and types of tokens in given text

Kind: instance method of Database

Param
text

database.get()

Not implemented

Kind: instance method of Database

duckdb~TokenType

Types of tokens return by tokenize.

Kind: inner property of duckdb

duckdb~ERROR : number

Check that errno attribute equals this to check for a duckdb error

Kind: inner constant of duckdb

duckdb~OPEN_READONLY : number

Open database in readonly mode

Kind: inner constant of duckdb

duckdb~OPEN_READWRITE : number

Currently ignored

Kind: inner constant of duckdb

duckdb~OPEN_CREATE : number

Currently ignored

Kind: inner constant of duckdb

duckdb~OPEN_FULLMUTEX : number

Currently ignored

Kind: inner constant of duckdb

duckdb~OPEN_SHAREDCACHE : number

Currently ignored

Kind: inner constant of duckdb

duckdb~OPEN_PRIVATECACHE : number

Currently ignored

Kind: inner constant of duckdb

ColumnInfo : object

Kind: global typedef
Properties

Name Type Description
name string Column name
type TypeInfo Column type

TypeInfo : object

Kind: global typedef
Properties

Name Type Description
id string Type ID
[alias] string SQL type alias
sql_type string SQL type name

DuckDbError : object

Kind: global typedef
Properties

Name Type Description
errno number -1 for DuckDB errors
message string Error message
code string 'DUCKDB_NODEJS_ERROR' for DuckDB errors
errorType string DuckDB error type code (eg, HTTP, IO, Catalog)

HTTPError : object

Kind: global typedef
Extends: DuckDbError
Properties

Name Type Description
statusCode number HTTP response status code
reason string HTTP response reason
response string HTTP response body
headers object HTTP headers

layout: docu title: Node.js Client (Neo) redirect_from:

  • /docs/api/node_neo/overview
  • /docs/api/node_neo/overview/

An API for using [DuckDB]({% link index.html %}) in Node.js.

The primary package, @duckdb/node-api, is a high-level API meant for applications. It depends on low-level bindings that adhere closely to [DuckDB's C API]({% link docs/clients/c/overview.md %}), available separately as @duckdb/node-bindings.

Features

Main Differences from duckdb-node

  • Native support for Promises; no need for separate duckdb-async wrapper.
  • DuckDB-specific API; not based on the SQLite Node API.
  • Lossless & efficent support for values of all [DuckDB data types]({% link docs/sql/data_types/overview.md %}).
  • Wraps released DuckDB binaries instead of rebuilding DuckDB.
  • Built on [DuckDB's C API]({% link docs/clients/c/overview.md %}); exposes more functionality.

Roadmap

Some features are not yet complete:

  • Appending and binding advanced data types. (Additional DuckDB C API support needed.)
  • Writing to data chunk vectors. (Needs special handling in Node.)
  • User-defined types & functions. (Support for this was added to the DuckDB C API in v1.1.0.)
  • Profiling info. (Added in v1.1.0)
  • Table description. (Added in v1.1.0)
  • APIs for Arrow. (This part of the DuckDB C API is deprecated.)

Supported Platforms

  • Linux ARM64 (experimental)
  • Linux AMD64
  • macOS (Darwin) ARM64 (Apple Silicon)
  • macOS (Darwin) AMD64 (Intel)
  • Windows (Win32) AMD64

Examples

Get Basic Information

import duckdb from '@duckdb/node-api';

console.log(duckdb.version());

console.log(duckdb.configurationOptionDescriptions());

Create Instance

import { DuckDBInstance } from '@duckdb/node-api';

Create with an in-memory database:

const instance = await DuckDBInstance.create(':memory:');

Equivalent to the above:

const instance = await DuckDBInstance.create();

Read from and write to a database file, which is created if needed:

const instance = await DuckDBInstance.create('my_duckdb.db');

Set configuration options:

const instance = await DuckDBInstance.create('my_duckdb.db', {
  threads: '4'
});

Connect

const connection = await instance.connect();

Run SQL

const result = await connection.run('from test_all_types()');

Parameterize SQL

const prepared = await connection.prepare('select $1, $2');
prepared.bindVarchar(1, 'duck');
prepared.bindInteger(2, 42);
const result = await prepared.run();

Inspect Result

Get column names and types:

const columnNames = result.columnNames();
const columnTypes = result.columnTypes();

Fetch all chunks:

const chunks = await result.fetchAllChunks();

Fetch one chunk at a time:

const chunks = [];
while (true) {
  const chunk = await result.fetchChunk();
  // Last chunk will have zero rows.
  if (!chunk || chunk.rowCount === 0) {
    break;
  }
  chunks.push(chunk);
}

Read chunk data (column-major):

// array of columns, each as an array of values
const columns = chunk.getColumns(); 

Read chunk data (row-major):

// array of rows, each as an array of values
const rows = chunk.getRows(); 

Read chunk data (one value at a time):

const columns = [];
const columnCount = chunk.columnCount;
for (let columnIndex = 0; columnIndex < columnCount; columnIndex++) {
  const columnValues = [];
  const columnVector = chunk.getColumnVector(columnIndex);
  const itemCount = columnVector.itemCount;
  for (let itemIndex = 0; itemIndex < itemCount; itemIndex++) {
    const value = columnVector.getItem(itemIndex);
    columnValues.push(value);
  }
  columns.push(columnValues);
}

Result Reader

Run and read all data:

const reader = await connection.runAndReadAll('FROM test_all_types()');
const rows = reader.getRows();
// OR: const columns = reader.getColumns();

Run and read up to (at least) some number of rows:

const reader = await connection.runAndReadUtil('FROM range(5000)', 1000);
const rows = reader.getRows();
// rows.length === 2048. (Rows are read in chunks of 2048.)

Read rows incrementally:

const reader = await connection.runAndRead('FROM range(5000)');
reader.readUntil(2000);
// reader.currentRowCount === 2048 (Rows are read in chunks of 2048.)
// reader.done === false
reader.readUntil(4000);
// reader.currentRowCount === 4096
// reader.done === false
reader.readUntil(6000);
// reader.currentRowCount === 5000
// reader.done === true

Inspect Data Types

import { DuckDBTypeId } from '@duckdb/node-api';

if (columnType.typeId === DuckDBTypeId.ARRAY) {
  const arrayValueType = columnType.valueType;
  const arrayLength = columnType.length;
}

if (columnType.typeId === DuckDBTypeId.DECIMAL) {
  const decimalWidth = columnType.width;
  const decimalScale = columnType.scale;
}

if (columnType.typeId === DuckDBTypeId.ENUM) {
  const enumValues = columnType.values;
}

if (columnType.typeId === DuckDBTypeId.LIST) {
  const listValueType = columnType.valueType;
}

if (columnType.typeId === DuckDBTypeId.MAP) {
  const mapKeyType = columnType.keyType;
  const mapValueType = columnType.valueType;
}

if (columnType.typeId === DuckDBTypeId.STRUCT) {
  const structEntryNames = columnType.names;
  const structEntryTypes = columnType.valueTypes;
}

if (columnType.typeId === DuckDBTypeId.UNION) {
  const unionMemberTags = columnType.memberTags;
  const unionMemberTypes = columnType.memberTypes;
}

// For the JSON type (https://duckdb.org/docs/data/json/json_type)
if (columnType.alias === 'JSON') {
  const json = JSON.parse(columnValue);
}

Every type implements toString. The result is both human-friendly and readable by DuckDB in an appropriate expression.

const typeString = columnType.toString();

Inspect Data Values

import { DuckDBTypeId } from '@duckdb/node-api';

if (columnType.typeId === DuckDBTypeId.ARRAY) {
  const arrayItems = columnValue.items; // array of values
  const arrayString = columnValue.toString();
}

if (columnType.typeId === DuckDBTypeId.BIT) {
  const bools = columnValue.toBools(); // array of booleans
  const bits = columnValue.toBits(); // arrary of 0s and 1s
  const bitString = columnValue.toString(); // string of '0's and '1's
}

if (columnType.typeId === DuckDBTypeId.BLOB) {
  const blobBytes = columnValue.bytes; // Uint8Array
  const blobString = columnValue.toString();
}

if (columnType.typeId === DuckDBTypeId.DATE) {
  const dateDays = columnValue.days;
  const dateString = columnValue.toString();
  const { year, month, day } = columnValue.toParts();
}

if (columnType.typeId === DuckDBTypeId.DECIMAL) {
  const decimalWidth = columnValue.width;
  const decimalScale = columnValue.scale;
  // Scaled-up value. Represented number is value/(10^scale).
  const decimalValue = columnValue.value; // bigint
  const decimalString = columnValue.toString();
  const decimalDouble = columnValue.toDouble();
}

if (columnType.typeId === DuckDBTypeId.INTERVAL) {
  const intervalMonths = columnValue.months;
  const intervalDays = columnValue.days;
  const intervalMicros = columnValue.micros; // bigint
  const intervalString = columnValue.toString();
}

if (columnType.typeId === DuckDBTypeId.LIST) {
  const listItems = columnValue.items; // array of values
  const listString = columnValue.toString();
}

if (columnType.typeId === DuckDBTypeId.MAP) {
  const mapEntries = columnValue.entries; // array of { key, value }
  const mapString = columnValue.toString();
}

if (columnType.typeId === DuckDBTypeId.STRUCT) {
  // { name1: value1, name2: value2, ... }
  const structEntries = columnValue.entries;
  const structString = columnValue.toString();
}

if (columnType.typeId === DuckDBTypeId.TIMESTAMP_MS) {
  const timestampMillis = columnValue.milliseconds; // bigint
  const timestampMillisString = columnValue.toString();
}

if (columnType.typeId === DuckDBTypeId.TIMESTAMP_NS) {
  const timestampNanos = columnValue.nanoseconds; // bigint
  const timestampNanosString = columnValue.toString();
}

if (columnType.typeId === DuckDBTypeId.TIMESTAMP_S) {
  const timestampSecs = columnValue.seconds; // bigint
  const timestampSecsString = columnValue.toString();
}

if (columnType.typeId === DuckDBTypeId.TIMESTAMP_TZ) {
  const timestampTZMicros = columnValue.micros; // bigint
  const timestampTZString = columnValue.toString();
  const {
    date: { year, month, day },
    time: { hour, min, sec, micros },
  } = columnValue.toParts();
}

if (columnType.typeId === DuckDBTypeId.TIMESTAMP) {
  const timestampMicros = columnValue.micros; // bigint
  const timestampString = columnValue.toString();
  const {
    date: { year, month, day },
    time: { hour, min, sec, micros },
  } = columnValue.toParts();
}

if (columnType.typeId === DuckDBTypeId.TIME_TZ) {
  const timeTZMicros = columnValue.micros; // bigint
  const timeTZOffset = columnValue.offset;
  const timeTZString = columnValue.toString();
  const {
    time: { hour, min, sec, micros },
    offset,
  } = columnValue.toParts();
}

if (columnType.typeId === DuckDBTypeId.TIME) {
  const timeMicros = columnValue.micros; // bigint
  const timeString = columnValue.toString();
  const { hour, min, sec, micros } = columnValue.toParts();
}

if (columnType.typeId === DuckDBTypeId.UNION) {
  const unionTag = columnValue.tag;
  const unionValue = columnValue.value;
  const unionValueString = columnValue.toString();
}

if (columnType.typeId === DuckDBTypeId.UUID) {
  const uuidHugeint = columnValue.hugeint; // bigint
  const uuidString = columnValue.toString();
}

// other possible values are: null, boolean, number, bigint, or string

Append To Table

await connection.run(
  `create or replace table target_table(i integer, v varchar)`
);

const appender = await connection.createAppender('main', 'target_table');

appender.appendInteger(42);
appender.appendVarchar('duck');
appender.endRow();

appender.appendInteger(123);
appender.appendVarchar('mallard');
appender.endRow();

appender.flush();

appender.appendInteger(17);
appender.appendVarchar('goose');
appender.endRow();

appender.close(); // also flushes

Extract Statements

const extractedStatements = await connection.extractStatements(`
  create or replace table numbers as from range(?);
  from numbers where range < ?;
  drop table numbers;
`);
const parameterValues = [10, 7];
const statementCount = extractedStatements.count;
for (let stmtIndex = 0; stmtIndex < statementCount; stmtIndex++) {
  const prepared = await extractedStatements.prepare(stmtIndex);
  let parameterCount = prepared.parameterCount;
  for (let paramIndex = 1; paramIndex <= parameterCount; paramIndex++) {
    prepared.bindInteger(paramIndex, parameterValues.shift());
  }
  const result = await prepared.run();
  // ...
}

Control Evaluation of Tasks

import { DuckDBPendingResultState } from '@duckdb/node-api';

async function sleep(ms) {
  return new Promise((resolve) => {
    setTimeout(resolve, ms);
  });
}

const prepared = await connection.prepare('FROM range(10_000_000)');
const pending = prepared.start();
while (pending.runTask() !== DuckDBPendingResultState.RESULT_READY) {
  console.log('not ready');
  await sleep(1);
}
console.log('ready');
const result = await pending.getResult();
// ...

layout: docu title: Go Client github_repository: https://github.com/marcboeker/go-duckdb redirect_from:

  • /docs/api/go
  • /docs/api/go/

The DuckDB Go driver, go-duckdb, allows using DuckDB via the database/sql interface. For examples on how to use this interface, see the official documentation and tutorial.

The go-duckdb project, hosted at https://github.com/marcboeker/go-duckdb, is the official DuckDB Go client.

Installation

To install the go-duckdb client, run:

go get github.com/marcboeker/go-duckdb

Importing

To import the DuckDB Go package, add the following entries to your imports:

import (
	"database/sql"
	_ "github.com/marcboeker/go-duckdb"
)

Appender

The DuckDB Go client supports the [DuckDB Appender API]({% link docs/data/appender.md %}) for bulk inserts. You can obtain a new Appender by supplying a DuckDB connection to NewAppenderFromConn(). For example:

connector, err := duckdb.NewConnector("test.db", nil)
if err != nil {
  ...
}
conn, err := connector.Connect(context.Background())
if err != nil {
  ...
}
defer conn.Close()

// Retrieve appender from connection (note that you have to create the table 'test' beforehand).
appender, err := NewAppenderFromConn(conn, "", "test")
if err != nil {
  ...
}
defer appender.Close()

err = appender.AppendRow(...)
if err != nil {
  ...
}

// Optional, if you want to access the appended rows immediately.
err = appender.Flush()
if err != nil {
  ...
}

Examples

Simple Example

An example for using the Go API is as follows:

package main

import (
	"database/sql"
	"errors"
	"fmt"
	"log"

	_ "github.com/marcboeker/go-duckdb"
)

func main() {
	db, err := sql.Open("duckdb", "")
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	_, err = db.Exec(`CREATE TABLE people (id INTEGER, name VARCHAR)`)
	if err != nil {
		log.Fatal(err)
	}
	_, err = db.Exec(`INSERT INTO people VALUES (42, 'John')`)
	if err != nil {
		log.Fatal(err)
	}

	var (
		id   int
		name string
	)
	row := db.QueryRow(`SELECT id, name FROM people`)
	err = row.Scan(&id, &name)
	if errors.Is(err, sql.ErrNoRows) {
		log.Println("no rows")
	} else if err != nil {
		log.Fatal(err)
	}

	fmt.Printf("id: %d, name: %s\n", id, name)
}

More Examples

For more examples, see the examples in the duckdb-go repository.

layout: docu title: Client Overview redirect_from:

  • /docs/clients
  • /docs/clients/
  • /docs/api/overview
  • /docs/api/overview/

DuckDB is an in-process database system and offers client APIs (also known as “drivers”) for several languages.

Client API Maintainer Support tier Latest version
[C]({% link docs/clients/c/overview.md %}) The DuckDB team Primary [{{ site.currentduckdbversion }}]({% link docs/installation/index.html %}?version=stable&environment=cplusplus)
[Command Line Interface (CLI)]({% link docs/clients/cli/overview.md %}) The DuckDB team Primary [{{ site.currentduckdbversion }}]({% link docs/installation/index.html %}?version=stable&environment=cli)
[Java (JDBC)]({% link docs/clients/java.md %}) The DuckDB team Primary {{ site.currentjavaversion }}
[Go]({% link docs/clients/go.md %}) Marc Boeker and the DuckDB team Primary 1.1.3
[Node.js (node-neo)]({% link docs/clients/node_neo/overview.md %}) Jeff Raymakers and Antony Courtney (MotherDuck) Primary 1.2.0
[Python]({% link docs/clients/python/overview.md %}) The DuckDB team Primary {{ site.currentduckdbversion }}
[R]({% link docs/clients/r.md %}) Kirill Müller and the DuckDB team Primary 1.2.0
[WebAssembly (Wasm)]({% link docs/clients/wasm/overview.md %}) The DuckDB team Primary 1.2.0
[ADBC (Arrow)]({% link docs/clients/adbc.md %}) The DuckDB team Secondary [{{ site.currentduckdbversion }}]({% link docs/extensions/arrow.md %})
C# (.NET) Giorgi Secondary 1.2.0
[C++]({% link docs/clients/cpp.md %}) The DuckDB team Secondary [1.2.0]({% link docs/installation/index.html %}?version=stable&environment=cplusplus)
[Dart]({% link docs/clients/dart.md %}) TigerEye Secondary 1.1.3
[Julia]({% link docs/clients/julia.md %}) The DuckDB team Secondary 1.2.0
[Node.js (deprecated)]({% link docs/clients/nodejs/overview.md %}) The DuckDB team Secondary 1.2.0
[ODBC]({% link docs/clients/odbc/overview.md %}) The DuckDB team Secondary [1.1.0]({% link docs/installation/index.html %}?version=stable&environment=odbc)
[Rust]({% link docs/clients/rust.md %}) The DuckDB team Secondary 1.2.0
[Swift]({% link docs/clients/swift.md %}) The DuckDB team Secondary 1.2.0
Common Lisp ak-coram Tertiary
Crystal amauryt Tertiary
Elixir AlexR2D2 Tertiary
Erlang MM Zeeman Tertiary
Ruby suketa Tertiary
Zig karlseguin Tertiary

Support Tiers

Since there is such a wide variety of clients, the DuckDB team focuses their development effort on the most popular clients. To reflect this, we distinguish three tiers of support for clients. Primary clients are the first to receive new features and are covered by community support. Secondary clients receive new features but are not covered by community support. Finally, all tertiary clients are maintained by third parties, so there are no feature or support guarantees for them.

The DuckDB clients listed above are open-source and we welcome community contributions to these libraries. All primary and secondary clients are available for the MIT license. For tertiary clients, please consult the repository for the license.

We report the latest stable version for the clients in the primary and secondary support tiers.

Compatibility

All DuckDB clients support the same DuckDB SQL syntax and use the same on-disk [database format]({% link docs/internals/storage.md %}). [DuckDB extensions]({% link docs/extensions/overview.md %}) are also portable between clients with some exceptions (see [Wasm extensions]({% link docs/clients/wasm/extensions.md %}#list-of-officially-available-extensions)).

layout: docu title: Spark API redirect_from:

  • /docs/api/python/spark_api
  • /docs/api/python/spark_api/

The DuckDB Spark API implements the PySpark API, allowing you to use the familiar Spark API to interact with DuckDB. All statements are translated to DuckDB's internal plans using our [relational API]({% link docs/clients/python/relational_api.md %}) and executed using DuckDB's query engine.

Warning The DuckDB Spark API is currently experimental and features are still missing. We are very interested in feedback. Please report any functionality that you are missing, either through Discord or on GitHub.

Example

from duckdb.experimental.spark.sql import SparkSession as session
from duckdb.experimental.spark.sql.functions import lit, col
import pandas as pd

spark = session.builder.getOrCreate()

pandas_df = pd.DataFrame({
    'age': [34, 45, 23, 56],
    'name': ['Joan', 'Peter', 'John', 'Bob']
})

df = spark.createDataFrame(pandas_df)
df = df.withColumn(
    'location', lit('Seattle')
)
res = df.select(
    col('age'),
    col('location')
).collect()

print(res)
[
    Row(age=34, location='Seattle'),
    Row(age=45, location='Seattle'),
    Row(age=23, location='Seattle'),
    Row(age=56, location='Seattle')
]

Contribution Guidelines

Contributions to the experimental Spark API are welcome. When making a contribution, please follow these guidelines:

  • Instead of using temporary files, use our pytest testing framework.
  • When adding new functions, ensure that method signatures comply with those in the PySpark API.

layout: docu title: Python API redirect_from:

  • /docs/api/python
  • /docs/api/python/
  • /docs/api/python/overview
  • /docs/api/python/overview/

Installation

The DuckDB Python API can be installed using pip: pip install duckdb. Please see the [installation page]({% link docs/installation/index.html %}?environment=python) for details. It is also possible to install DuckDB using conda: conda install python-duckdb -c conda-forge.

Python version: DuckDB requires Python 3.7 or newer.

Basic API Usage

The most straight-forward manner of running SQL queries using DuckDB is using the duckdb.sql command.

import duckdb

duckdb.sql("SELECT 42").show()

This will run queries using an in-memory database that is stored globally inside the Python module. The result of the query is returned as a Relation. A relation is a symbolic representation of the query. The query is not executed until the result is fetched or requested to be printed to the screen.

Relations can be referenced in subsequent queries by storing them inside variables, and using them as tables. This way queries can be constructed incrementally.

import duckdb

r1 = duckdb.sql("SELECT 42 AS i")
duckdb.sql("SELECT i * 2 AS k FROM r1").show()

Data Input

DuckDB can ingest data from a wide variety of formats – both on-disk and in-memory. See the [data ingestion page]({% link docs/clients/python/data_ingestion.md %}) for more information.

import duckdb

duckdb.read_csv("example.csv")                # read a CSV file into a Relation
duckdb.read_parquet("example.parquet")        # read a Parquet file into a Relation
duckdb.read_json("example.json")              # read a JSON file into a Relation

duckdb.sql("SELECT * FROM 'example.csv'")     # directly query a CSV file
duckdb.sql("SELECT * FROM 'example.parquet'") # directly query a Parquet file
duckdb.sql("SELECT * FROM 'example.json'")    # directly query a JSON file

DataFrames

DuckDB can directly query Pandas DataFrames, Polars DataFrames and Arrow tables. Note that these are read-only, i.e., editing these tables via [INSERT]({% link docs/sql/statements/insert.md %}) or [UPDATE statements]({% link docs/sql/statements/update.md %}) is not possible.

Pandas

To directly query a Pandas DataFrame, run:

import duckdb
import pandas as pd

pandas_df = pd.DataFrame({"a": [42]})
duckdb.sql("SELECT * FROM pandas_df")
┌───────┐
│   a   │
│ int64 │
├───────┤
│    42 │
└───────┘

Polars

To directly query a Polars DataFrame, run:

import duckdb
import polars as pl

polars_df = pl.DataFrame({"a": [42]})
duckdb.sql("SELECT * FROM polars_df")
┌───────┐
│   a   │
│ int64 │
├───────┤
│    42 │
└───────┘

PyArrow

To directly query a PyArrow table, run:

import duckdb
import pyarrow as pa

arrow_table = pa.Table.from_pydict({"a": [42]})
duckdb.sql("SELECT * FROM arrow_table")
┌───────┐
│   a   │
│ int64 │
├───────┤
│    42 │
└───────┘

Result Conversion

DuckDB supports converting query results efficiently to a variety of formats. See the [result conversion page]({% link docs/clients/python/conversion.md %}) for more information.

import duckdb

duckdb.sql("SELECT 42").fetchall()   # Python objects
duckdb.sql("SELECT 42").df()         # Pandas DataFrame
duckdb.sql("SELECT 42").pl()         # Polars DataFrame
duckdb.sql("SELECT 42").arrow()      # Arrow Table
duckdb.sql("SELECT 42").fetchnumpy() # NumPy Arrays

Writing Data to Disk

DuckDB supports writing Relation objects directly to disk in a variety of formats. The [COPY statement]({% link docs/sql/statements/copy.md %}) can be used to write data to disk using SQL as an alternative.

import duckdb

duckdb.sql("SELECT 42").write_parquet("out.parquet") # Write to a Parquet file
duckdb.sql("SELECT 42").write_csv("out.csv")         # Write to a CSV file
duckdb.sql("COPY (SELECT 42) TO 'out.parquet'")      # Copy to a Parquet file

Connection Options

Applications can open a new DuckDB connection via the duckdb.connect() method.

Using an In-Memory Database

When using DuckDB through duckdb.sql(), it operates on an in-memory database, i.e., no tables are persisted on disk. Invoking the duckdb.connect() method without arguments returns a connection, which also uses an in-memory database:

import duckdb

con = duckdb.connect()
con.sql("SELECT 42 AS x").show()

Persistent Storage

The duckdb.connect(dbname) creates a connection to a persistent database. Any data written to that connection will be persisted, and can be reloaded by reconnecting to the same file, both from Python and from other DuckDB clients.

import duckdb

# create a connection to a file called 'file.db'
con = duckdb.connect("file.db")
# create a table and load data into it
con.sql("CREATE TABLE test (i INTEGER)")
con.sql("INSERT INTO test VALUES (42)")
# query the table
con.table("test").show()
# explicitly close the connection
con.close()
# Note: connections also closed implicitly when they go out of scope

You can also use a context manager to ensure that the connection is closed:

import duckdb

with duckdb.connect("file.db") as con:
    con.sql("CREATE TABLE test (i INTEGER)")
    con.sql("INSERT INTO test VALUES (42)")
    con.table("test").show()
    # the context manager closes the connection automatically

Configuration

The duckdb.connect() accepts a config dictionary, where [configuration options]({% link docs/configuration/overview.md %}#configuration-reference) can be specified. For example:

import duckdb

con = duckdb.connect(config = {'threads': 1})

Connection Object and Module

The connection object and the duckdb module can be used interchangeably – they support the same methods. The only difference is that when using the duckdb module a global in-memory database is used.

If you are developing a package designed for others to use, and use DuckDB in the package, it is recommend that you create connection objects instead of using the methods on the duckdb module. That is because the duckdb module uses a shared global database – which can cause hard to debug issues if used from within multiple different packages.

Using Connections in Parallel Python Programs

The DuckDBPyConnection object is not thread-safe. If you would like to write to the same database from multiple threads, create a cursor for each thread with the [DuckDBPyConnection.cursor() method]({% link docs/clients/python/reference/index.md %}#duckdb.DuckDBPyConnection.cursor).

Loading and Installing Extensions

DuckDB's Python API provides functions for installing and loading [extensions]({% link docs/extensions/overview.md %}), which perform the equivalent operations to running the INSTALL and LOAD SQL commands, respectively. An example that installs and loads the [spatial extension]({% link docs/extensions/spatial/overview.md %}) looks like follows:

import duckdb

con = duckdb.connect()
con.install_extension("spatial")
con.load_extension("spatial")

Community Extensions

To load [community extensions]({% link community_extensions/index.md %}), use repository="community" argument to the install_extension method.

For example, install and load the h3 community extension as follows:

import duckdb

con = duckdb.connect()
con.install_extension("h3", repository="community")
con.load_extension("h3")

Unsigned Extensions

To load [unsigned extensions]({% link docs/extensions/overview.md %}#unsigned-extensions), use the config = {"allow_unsigned_extensions": "true"} argument to the duckdb.connect() method.

layout: docu title: Expression API redirect_from:

  • /docs/api/python/expression
  • /docs/api/python/expression/

The Expression class represents an instance of an [expression]({% link docs/sql/expressions/overview.md %}).

Why Would I Use the Expression API?

Using this API makes it possible to dynamically build up expressions, which are typically created by the parser from the query string. This allows you to skip that and have more fine-grained control over the used expressions.

Below is a list of currently supported expressions that can be created through the API.

Column Expression

This expression references a column by name.

import duckdb
import pandas as pd

df = pd.DataFrame({
    'a': [1, 2, 3, 4],
    'b': [True, None, False, True],
    'c': [42, 21, 13, 14]
})

Selecting a single column:

col = duckdb.ColumnExpression('a')
res = duckdb.df(df).select(col).fetchall()
print(res)
[(1,), (2,), (3,), (4,)]

Selecting multiple columns:

col_list = [
        duckdb.ColumnExpression('a') * 10,
        duckdb.ColumnExpression('b').isnull(),
        duckdb.ColumnExpression('c') + 5
    ]
res = duckdb.df(df).select(*col_list).fetchall()
print(res)
[(10, False, 47), (20, True, 26), (30, False, 18), (40, False, 19)]

Star Expression

This expression selects all columns of the input source.

Optionally it's possible to provide an exclude list to filter out columns of the table. This exclude list can contain either strings or Expressions.

import duckdb
import pandas as pd

df = pd.DataFrame({
    'a': [1, 2, 3, 4],
    'b': [True, None, False, True],
    'c': [42, 21, 13, 14]
})

star = duckdb.StarExpression(exclude = ['b'])
res = duckdb.df(df).select(star).fetchall()
print(res)
[(1, 42), (2, 21), (3, 13), (4, 14)]

Constant Expression

This expression contains a single value.

import duckdb
import pandas as pd

df = pd.DataFrame({
    'a': [1, 2, 3, 4],
    'b': [True, None, False, True],
    'c': [42, 21, 13, 14]
})

const = duckdb.ConstantExpression('hello')
res = duckdb.df(df).select(const).fetchall()
print(res)
[('hello',), ('hello',), ('hello',), ('hello',)]

Case Expression

This expression contains a CASE WHEN (...) THEN (...) ELSE (...) END expression. By default ELSE is NULL and it can be set using .else(value = ...). Additional WHEN (...) THEN (...) blocks can be added with .when(condition = ..., value = ...).

import duckdb
import pandas as pd
from duckdb import (
    ConstantExpression,
    ColumnExpression,
    CaseExpression
)

df = pd.DataFrame({
    'a': [1, 2, 3, 4],
    'b': [True, None, False, True],
    'c': [42, 21, 13, 14]
})

hello = ConstantExpression('hello')
world = ConstantExpression('world')

case = \
    CaseExpression(condition = ColumnExpression('b') == False, value = world) \
    .otherwise(hello)
res = duckdb.df(df).select(case).fetchall()
print(res)
[('hello',), ('hello',), ('world',), ('hello',)]

Function Expression

This expression contains a function call. It can be constructed by providing the function name and an arbitrary amount of Expressions as arguments.

import duckdb
import pandas as pd
from duckdb import (
    ConstantExpression,
    ColumnExpression,
    FunctionExpression
)

df = pd.DataFrame({
    'a': [
        'test',
        'pest',
        'text',
        'rest',
    ]
})

ends_with = FunctionExpression('ends_with', ColumnExpression('a'), ConstantExpression('est'))
res = duckdb.df(df).select(ends_with).fetchall()
print(res)
[(True,), (True,), (False,), (True,)]

Common Operations

The Expression class also contains many operations that can be applied to any Expression type.

Operation Description
.alias(name: str) Applies an alias to the expression
.cast(type: DuckDBPyType) Applies a cast to the provided type on the expression
.isin(*exprs: Expression) Creates an [IN expression]({% link docs/sql/expressions/in.md %}#in) against the provided expressions as the list
.isnotin(*exprs: Expression) Creates a [NOT IN expression]({% link docs/sql/expressions/in.md %}#not-in) against the provided expressions as the list
.isnotnull() Checks whether the expression is not NULL
.isnull() Checks whether the expression is NULL

Order Operations

When expressions are provided to DuckDBPyRelation.order(), the following order operations can be applied.

Operation Description
.asc() Indicates that this expression should be sorted in ascending order
.desc() Indicates that this expression should be sorted in descending order
.nulls_first() Indicates that the nulls in this expression should precede the non-null values
.nulls_last() Indicates that the nulls in this expression should come after the non-null values

layout: docu title: Python Function API redirect_from:

  • /docs/api/python/function
  • /docs/api/python/function/

You can create a DuckDB user-defined function (UDF) from a Python function so it can be used in SQL queries. Similarly to regular [functions]({% link docs/sql/functions/overview.md %}), they need to have a name, a return type and parameter types.

Here is an example using a Python function that calls a third-party library.

import duckdb
from duckdb.typing import *
from faker import Faker

def generate_random_name():
    fake = Faker()
    return fake.name()

duckdb.create_function("random_name", generate_random_name, [], VARCHAR)
res = duckdb.sql("SELECT random_name()").fetchall()
print(res)
[('Gerald Ashley',)]

Creating Functions

To register a Python UDF, use the create_function method from a DuckDB connection. Here is the syntax:

import duckdb
con = duckdb.connect()
con.create_function(name, function, parameters, return_type)

The create_function method takes the following parameters:

  1. name A string representing the unique name of the UDF within the connection catalog.
  2. function The Python function you wish to register as a UDF.
  3. parameters Scalar functions can operate on one or more columns. This parameter takes a list of column types used as input.
  4. return_type Scalar functions return one element per row. This parameter specifies the return type of the function.
  5. type (optional): DuckDB supports both built-in Python types and PyArrow Tables. By default, built-in types are assumed, but you can specify type = 'arrow' to use PyArrow Tables.
  6. null_handling (optional): By default, NULL values are automatically handled as NULL-in NULL-out. Users can specify a desired behavior for NULL values by setting null_handling = 'special'.
  7. exception_handling (optional): By default, when an exception is thrown from the Python function, it will be re-thrown in Python. Users can disable this behavior, and instead return NULL, by setting this parameter to 'return_null'
  8. side_effects (optional): By default, functions are expected to produce the same result for the same input. If the result of a function is impacted by any type of randomness, side_effects must be set to True.

To unregister a UDF, you can call the remove_function method with the UDF name:

con.remove_function(name)

Type Annotation

When the function has type annotation it's often possible to leave out all of the optional parameters. Using DuckDBPyType we can implicitly convert many known types to DuckDBs type system. For example:

import duckdb

def my_function(x: int) -> str:
    return x

duckdb.create_function("my_func", my_function)
print(duckdb.sql("SELECT my_func(42)"))
┌─────────────┐
│ my_func(42) │
│   varchar   │
├─────────────┤
│ 42          │
└─────────────┘

If only the parameter list types can be inferred, you'll need to pass in None as parameters.

NULL Handling

By default when functions receive a NULL value, this instantly returns NULL, as part of the default NULL-handling. When this is not desired, you need to explicitly set this parameter to "special".

import duckdb
from duckdb.typing import *

def dont_intercept_null(x):
    return 5

duckdb.create_function("dont_intercept", dont_intercept_null, [BIGINT], BIGINT)
res = duckdb.sql("SELECT dont_intercept(NULL)").fetchall()
print(res)
[(None,)]

With null_handling="special":

import duckdb
from duckdb.typing import *

def dont_intercept_null(x):
    return 5

duckdb.create_function("dont_intercept", dont_intercept_null, [BIGINT], BIGINT, null_handling="special")
res = duckdb.sql("SELECT dont_intercept(NULL)").fetchall()
print(res)
[(5,)]

Exception Handling

By default, when an exception is thrown from the Python function, we'll forward (re-throw) the exception. If you want to disable this behavior, and instead return NULL, you'll need to set this parameter to "return_null".

import duckdb
from duckdb.typing import *

def will_throw():
    raise ValueError("ERROR")

duckdb.create_function("throws", will_throw, [], BIGINT)
try:
    res = duckdb.sql("SELECT throws()").fetchall()
except duckdb.InvalidInputException as e:
    print(e)

duckdb.create_function("doesnt_throw", will_throw, [], BIGINT, exception_handling="return_null")
res = duckdb.sql("SELECT doesnt_throw()").fetchall()
print(res)
Invalid Input Error: Python exception occurred while executing the UDF: ValueError: ERROR

At:
  ...(5): will_throw
  ...(9): <module>
[(None,)]

Side Effects

By default DuckDB will assume the created function is a pure function, meaning it will produce the same output when given the same input. If your function does not follow that rule, for example when your function makes use of randomness, then you will need to mark this function as having side_effects.

For example, this function will produce a new count for every invocation.

def count() -> int:
    old = count.counter;
    count.counter += 1
    return old

count.counter = 0

If we create this function without marking it as having side effects, the result will be the following:

con = duckdb.connect()
con.create_function("my_counter", count, side_effects=False)
res = con.sql("SELECT my_counter() FROM range(10)").fetchall()
print(res)
[(0,), (0,), (0,), (0,), (0,), (0,), (0,), (0,), (0,), (0,)]

Which is obviously not the desired result, when we add side_effects=True, the result is as we would expect:

con.remove_function("my_counter")
count.counter = 0
con.create_function("my_counter", count, side_effects=True)
res = con.sql("SELECT my_counter() FROM range(10)").fetchall()
print(res)
[(0,), (1,), (2,), (3,), (4,), (5,), (6,), (7,), (8,), (9,)]

Python Function Types

Currently, two function types are supported, native (default) and arrow.

Arrow

If the function is expected to receive arrow arrays, set the type parameter to 'arrow'.

This will let the system know to provide arrow arrays of up to STANDARD_VECTOR_SIZE tuples to the function, and also expect an array of the same amount of tuples to be returned from the function.

Native

When the function type is set to native the function will be provided with a single tuple at a time, and expect only a single value to be returned. This can be useful to interact with Python libraries that don't operate on Arrow, such as faker:

import duckdb

from duckdb.typing import *
from faker import Faker

def random_date():
    fake = Faker()
    return fake.date_between()

duckdb.create_function("random_date", random_date, [], DATE, type="native")
res = duckdb.sql("SELECT random_date()").fetchall()
print(res)
[(datetime.date(2019, 5, 15),)]

layout: docu title: Python DB API redirect_from:

  • /docs/api/python/dbapi
  • /docs/api/python/dbapi/

The standard DuckDB Python API provides a SQL interface compliant with the DB-API 2.0 specification described by PEP 249 similar to the SQLite Python API.

Connection

To use the module, you must first create a DuckDBPyConnection object that represents a connection to a database. This is done through the [duckdb.connect]({% link docs/clients/python/reference/index.md %}#duckdb.connect) method.

The 'config' keyword argument can be used to provide a dict that contains key->value pairs referencing [settings]({% link docs/configuration/overview.md %}#configuration-reference) understood by DuckDB.

In-Memory Connection

The special value :memory: can be used to create an in-memory database. Note that for an in-memory database no data is persisted to disk (i.e., all data is lost when you exit the Python process).

Named in-memory Connections

The special value :memory: can also be postfixed with a name, for example: :memory:conn3. When a name is provided, subsequent duckdb.connect calls will create a new connection to the same database, sharing the catalogs (views, tables, macros etc..).

Using :memory: without a name will always create a new and separate database instance.

Default Connection

By default we create an (unnamed) in-memory-database that lives inside the duckdb module. Every method of DuckDBPyConnection is also available on the duckdb module, this connection is what's used by these methods.

The special value :default: can be used to get this default connection.

File-Based Connection

If the database is a file path, a connection to a persistent database is established. If the file does not exist the file will be created (the extension of the file is irrelevant and can be .db, .duckdb or anything else).

read_only Connections

If you would like to connect in read-only mode, you can set the read_only flag to True. If the file does not exist, it is not created when connecting in read-only mode. Read-only mode is required if multiple Python processes want to access the same database file at the same time.

import duckdb

duckdb.execute("CREATE TABLE tbl AS SELECT 42 a")
con = duckdb.connect(":default:")
con.sql("SELECT * FROM tbl")
# or
duckdb.default_connection.sql("SELECT * FROM tbl")
┌───────┐
│   a   │
│ int32 │
├───────┤
│    42 │
└───────┘
import duckdb

# to start an in-memory database
con = duckdb.connect(database = ":memory:")
# to use a database file (not shared between processes)
con = duckdb.connect(database = "my-db.duckdb", read_only = False)
# to use a database file (shared between processes)
con = duckdb.connect(database = "my-db.duckdb", read_only = True)
# to explicitly get the default connection
con = duckdb.connect(database = ":default:")

If you want to create a second connection to an existing database, you can use the cursor() method. This might be useful for example to allow parallel threads running queries independently. A single connection is thread-safe but is locked for the duration of the queries, effectively serializing database access in this case.

Connections are closed implicitly when they go out of scope or if they are explicitly closed using close(). Once the last connection to a database instance is closed, the database instance is closed as well.

Querying

SQL queries can be sent to DuckDB using the execute() method of connections. Once a query has been executed, results can be retrieved using the fetchone and fetchall methods on the connection. fetchall will retrieve all results and complete the transaction. fetchone will retrieve a single row of results each time that it is invoked until no more results are available. The transaction will only close once fetchone is called and there are no more results remaining (the return value will be None). As an example, in the case of a query only returning a single row, fetchone should be called once to retrieve the results and a second time to close the transaction. Below are some short examples:

# create a table
con.execute("CREATE TABLE items (item VARCHAR, value DECIMAL(10, 2), count INTEGER)")
# insert two items into the table
con.execute("INSERT INTO items VALUES ('jeans', 20.0, 1), ('hammer', 42.2, 2)")

# retrieve the items again
con.execute("SELECT * FROM items")
print(con.fetchall())
# [('jeans', Decimal('20.00'), 1), ('hammer', Decimal('42.20'), 2)]

# retrieve the items one at a time
con.execute("SELECT * FROM items")
print(con.fetchone())
# ('jeans', Decimal('20.00'), 1)
print(con.fetchone())
# ('hammer', Decimal('42.20'), 2)
print(con.fetchone()) # This closes the transaction. Any subsequent calls to .fetchone will return None
# None

The description property of the connection object contains the column names as per the standard.

Prepared Statements

DuckDB also supports [prepared statements]({% link docs/sql/query_syntax/prepared_statements.md %}) in the API with the execute and executemany methods. The values may be passed as an additional parameter after a query that contains ? or $1 (dollar symbol and a number) placeholders. Using the ? notation adds the values in the same sequence as passed within the Python parameter. Using the $ notation allows for values to be reused within the SQL statement based on the number and index of the value found within the Python parameter. Values are converted according to the [conversion rules]({% link docs/clients/python/conversion.md %}#object-conversion-python-object-to-duckdb).

Here are some examples. First, insert a row using a [prepared statement]({% link docs/sql/query_syntax/prepared_statements.md %}):

con.execute("INSERT INTO items VALUES (?, ?, ?)", ["laptop", 2000, 1])

Second, insert several rows using a [prepared statement]({% link docs/sql/query_syntax/prepared_statements.md %}):

con.executemany("INSERT INTO items VALUES (?, ?, ?)", [["chainsaw", 500, 10], ["iphone", 300, 2]] )

Query the database using a [prepared statement]({% link docs/sql/query_syntax/prepared_statements.md %}):

con.execute("SELECT item FROM items WHERE value > ?", [400])
print(con.fetchall())
[('laptop',), ('chainsaw',)]

Query using the $ notation for a [prepared statement]({% link docs/sql/query_syntax/prepared_statements.md %}) and reused values:

con.execute("SELECT $1, $1, $2", ["duck", "goose"])
print(con.fetchall())
[('duck', 'duck', 'goose')]

Warning Do not use executemany to insert large amounts of data into DuckDB. See the [data ingestion page]({% link docs/clients/python/data_ingestion.md %}) for better options.

Named Parameters

Besides the standard unnamed parameters, like $1, $2 etc., it's also possible to supply named parameters, like $my_parameter. When using named parameters, you have to provide a dictionary mapping of str to value in the parameters argument. An example use is the following:

import duckdb

res = duckdb.execute("""
    SELECT
        $my_param,
        $other_param,
        $also_param
    """,
    {
        "my_param": 5,
        "other_param": "DuckDB",
        "also_param": [42]
    }
).fetchall()
print(res)
[(5, 'DuckDB', [42])]

layout: docu title: Types API redirect_from:

  • /docs/api/python/types
  • /docs/api/python/types/

The DuckDBPyType class represents a type instance of our [data types]({% link docs/sql/data_types/overview.md %}).

Converting from Other Types

To make the API as easy to use as possible, we have added implicit conversions from existing type objects to a DuckDBPyType instance. This means that wherever a DuckDBPyType object is expected, it is also possible to provide any of the options listed below.

Python Built-ins

The table below shows the mapping of Python Built-in types to DuckDB type.

Built-in types DuckDB type
bool BOOLEAN
bytearray BLOB
bytes BLOB
float DOUBLE
int BIGINT
str VARCHAR

Numpy DTypes

The table below shows the mapping of Numpy DType to DuckDB type.

Type DuckDB type
bool BOOLEAN
float32 FLOAT
float64 DOUBLE
int16 SMALLINT
int32 INTEGER
int64 BIGINT
int8 TINYINT
uint16 USMALLINT
uint32 UINTEGER
uint64 UBIGINT
uint8 UTINYINT

Nested Types

list[child_type]

list type objects map to a LIST type of the child type. Which can also be arbitrarily nested.

import duckdb
from typing import Union

duckdb.typing.DuckDBPyType(list[dict[Union[str, int], str]])
MAP(UNION(u1 VARCHAR, u2 BIGINT), VARCHAR)[]

dict[key_type, value_type]

dict type objects map to a MAP type of the key type and the value type.

import duckdb

print(duckdb.typing.DuckDBPyType(dict[str, int]))
MAP(VARCHAR, BIGINT)

{'a': field_one, 'b': field_two, .., 'n': field_n}

dict objects map to a STRUCT composed of the keys and values of the dict.

import duckdb

print(duckdb.typing.DuckDBPyType({'a': str, 'b': int}))
STRUCT(a VARCHAR, b BIGINT)

Union[⟨type_1⟩, ... ⟨type_n⟩]

typing.Union objects map to a UNION type of the provided types.

import duckdb
from typing import Union

print(duckdb.typing.DuckDBPyType(Union[int, str, bool, bytearray]))
UNION(u1 BIGINT, u2 VARCHAR, u3 BOOLEAN, u4 BLOB)

Creation Functions

For the built-in types, you can use the constants defined in duckdb.typing:

DuckDB type
BIGINT
BIT
BLOB
BOOLEAN
DATE
DOUBLE
FLOAT
HUGEINT
INTEGER
INTERVAL
SMALLINT
SQLNULL
TIME_TZ
TIME
TIMESTAMP_MS
TIMESTAMP_NS
TIMESTAMP_S
TIMESTAMP_TZ
TIMESTAMP
TINYINT
UBIGINT
UHUGEINT
UINTEGER
USMALLINT
UTINYINT
UUID
VARCHAR

For the complex types there are methods available on the DuckDBPyConnection object or the duckdb module. Anywhere a DuckDBPyType is accepted, we will also accept one of the type objects that can implicitly convert to a DuckDBPyType.

list_type | array_type

Parameters:

  • child_type: DuckDBPyType

struct_type | row_type

Parameters:

  • fields: Union[list[DuckDBPyType], dict[str, DuckDBPyType]]

map_type

Parameters:

  • key_type: DuckDBPyType
  • value_type: DuckDBPyType

decimal_type

Parameters:

  • width: int
  • scale: int

union_type

Parameters:

  • members: Union[list[DuckDBPyType], dict[str, DuckDBPyType]]

string_type

Parameters:

  • collation: Optional[str]

layout: docu title: Known Python Issues redirect_from:

  • /docs/api/python/known_issues
  • /docs/api/python/known_issues/

Unfortunately there are some issues that are either beyond our control or are very elusive / hard to track down. Below is a list of these issues that you might have to be aware of, depending on your workflow.

Numpy Import Multithreading

When making use of multi threading and fetching results either directly as Numpy arrays or indirectly through a Pandas DataFrame, it might be necessary to ensure that numpy.core.multiarray is imported. If this module has not been imported from the main thread, and a different thread during execution attempts to import it this causes either a deadlock or a crash.

To avoid this, it's recommended to import numpy.core.multiarray before starting up threads.

DESCRIBE and SUMMARIZE Return Empty Tables in Jupyter

The DESCRIBE and SUMMARIZE statements return an empty table:

%sql
CREATE OR REPLACE TABLE tbl AS (SELECT 42 AS x);
DESCRIBE tbl;

To work around this, wrap them into a subquery:

%sql
CREATE OR REPLACE TABLE tbl AS (SELECT 42 AS x);
FROM (DESCRIBE tbl);

Protobuf Error for JupySQL in IPython

Loading the JupySQL extension in IPython fails:

In [1]: %load_ext sql
ImportError: cannot import name 'builder' from 'google.protobuf.internal' (unknown location)

The solution is to fix the protobuf package. This may require uninstalling conflicting packages, e.g.:

%pip uninstall tensorflow
%pip install protobuf

Running EXPLAIN Renders Newlines

In Python, the output of the [EXPLAIN statement]({% link docs/guides/meta/explain.md %}) contains hard line breaks (\n):

In [1]: import duckdb
   ...: duckdb.sql("EXPLAIN SELECT 42 AS x")
Out[1]:
┌───────────────┬───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│  explain_key  │                                                   explain_value                                                   │
│    varchar    │                                                      varchar                                                      │
├───────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ physical_plan │ ┌───────────────────────────┐\n│         PROJECTION        │\n│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │\n│             x   …  │
└───────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

To work around this, print the output of the explain() function:

In [2]: print(duckdb.sql("SELECT 42 AS x").explain())
Out[2]:
┌───────────────────────────┐
│         PROJECTION        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             x             │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         DUMMY_SCAN        │
└───────────────────────────┘

Please also check out the [Jupyter guide]({% link docs/guides/python/jupyter.md %}) for tips on using Jupyter with JupySQL.

Error When Importing the DuckDB Python Package on Windows

When importing DuckDB on Windows, the Python runtime may return the following error:

import duckdb
ImportError: DLL load failed while importing duckdb: The specified module could not be found.

The solution is to install the Microsoft Visual C++ Redistributable package.

layout: docu title: Relational API redirect_from:

  • /docs/api/python/relational_api
  • /docs/api/python/relational_api/

The Relational API is an alternative API that can be used to incrementally construct queries. The API is centered around DuckDBPyRelation nodes. The relations can be seen as symbolic representations of SQL queries. They do not hold any data – and nothing is executed – until a method that triggers execution is called.

Constructing Relations

Relations can be created from SQL queries using the duckdb.sql method. Alternatively, they can be created from the various data ingestion methods (read_parquet, read_csv, read_json).

For example, here we create a relation from a SQL query:

import duckdb

rel = duckdb.sql("SELECT * FROM range(10_000_000_000) tbl(id)")
rel.show()
┌────────────────────────┐
│           id           │
│         int64          │
├────────────────────────┤
│                      0 │
│                      1 │
│                      2 │
│                      3 │
│                      4 │
│                      5 │
│                      6 │
│                      7 │
│                      8 │
│                      9 │
│                      · │
│                      · │
│                      · │
│                   9990 │
│                   9991 │
│                   9992 │
│                   9993 │
│                   9994 │
│                   9995 │
│                   9996 │
│                   9997 │
│                   9998 │
│                   9999 │
├────────────────────────┤
│         ? rows         │
│ (>9999 rows, 20 shown) │
└────────────────────────┘

Note how we are constructing a relation that computes an immense amount of data (10B rows or 74 GB of data). The relation is constructed instantly – and we can even print the relation instantly.

When printing a relation using show or displaying it in the terminal, the first 10K rows are fetched. If there are more than 10K rows, the output window will show >9999 rows (as the amount of rows in the relation is unknown).

Data Ingestion

Outside of SQL queries, the following methods are provided to construct relation objects from external data.

  • from_arrow
  • from_df
  • read_csv
  • read_json
  • read_parquet

SQL Queries

Relation objects can be queried through SQL through [replacement scans]({% link docs/clients/c/replacement_scans.md %}). If you have a relation object stored in a variable, you can refer to that variable as if it was a SQL table (in the FROM clause). This allows you to incrementally build queries using relation objects.

import duckdb

rel = duckdb.sql("SELECT * FROM range(1_000_000) tbl(id)")
duckdb.sql("SELECT sum(id) FROM rel").show()
┌──────────────┐
│   sum(id)    │
│    int128    │
├──────────────┤
│ 499999500000 │
└──────────────┘

Operations

There are a number of operations that can be performed on relations. These are all short-hand for running the SQL queries – and will return relations again themselves.

aggregate(expr, groups = {})

Apply an (optionally grouped) aggregate over the relation. The system will automatically group by any columns that are not aggregates.

import duckdb

rel = duckdb.sql("SELECT * FROM range(1_000_000) tbl(id)")
rel.aggregate("id % 2 AS g, sum(id), min(id), max(id)")
┌───────┬──────────────┬─────────┬─────────┐
│   g   │   sum(id)    │ min(id) │ max(id) │
│ int64 │    int128    │  int64  │  int64  │
├───────┼──────────────┼─────────┼─────────┤
│     0 │ 249999500000 │       0 │  999998 │
│     1 │ 250000000000 │       1 │  999999 │
└───────┴──────────────┴─────────┴─────────┘

except_(rel)

Select all rows in the first relation, that do not occur in the second relation. The relations must have the same number of columns.

import duckdb

r1 = duckdb.sql("SELECT * FROM range(10) tbl(id)")
r2 = duckdb.sql("SELECT * FROM range(5) tbl(id)")
r1.except_(r2).show()
┌───────┐
│  id   │
│ int64 │
├───────┤
│     5 │
│     6 │
│     7 │
│     8 │
│     9 │
└───────┘

filter(condition)

Apply the given condition to the relation, filtering any rows that do not satisfy the condition.

import duckdb

rel = duckdb.sql("SELECT * FROM range(1_000_000) tbl(id)")
rel.filter("id > 5").limit(3).show()
┌───────┐
│  id   │
│ int64 │
├───────┤
│     6 │
│     7 │
│     8 │
└───────┘

intersect(rel)

Select the intersection of two relations – returning all rows that occur in both relations. The relations must have the same number of columns.

import duckdb

r1 = duckdb.sql("SELECT * FROM range(10) tbl(id)")
r2 = duckdb.sql("SELECT * FROM range(5) tbl(id)")
r1.intersect(r2).show()
┌───────┐
│  id   │
│ int64 │
├───────┤
│     0 │
│     1 │
│     2 │
│     3 │
│     4 │
└───────┘

join(rel, condition, type = "inner")

Combine two relations, joining them based on the provided condition.

import duckdb

r1 = duckdb.sql("SELECT * FROM range(5) tbl(id)").set_alias("r1")
r2 = duckdb.sql("SELECT * FROM range(10, 15) tbl(id)").set_alias("r2")
r1.join(r2, "r1.id + 10 = r2.id").show()
┌───────┬───────┐
│  id   │  id   │
│ int64 │ int64 │
├───────┼───────┤
│     0 │    10 │
│     1 │    11 │
│     2 │    12 │
│     3 │    13 │
│     4 │    14 │
└───────┴───────┘

limit(n, offset = 0)

Select the first n rows, optionally offset by offset.

import duckdb

rel = duckdb.sql("SELECT * FROM range(1_000_000) tbl(id)")
rel.limit(3).show()
┌───────┐
│  id   │
│ int64 │
├───────┤
│     0 │
│     1 │
│     2 │
└───────┘

order(expr)

Sort the relation by the given set of expressions.

import duckdb

rel = duckdb.sql("SELECT * FROM range(1_000_000) tbl(id)")
rel.order("id DESC").limit(3).show()
┌────────┐
│   id   │
│ int64  │
├────────┤
│ 999999 │
│ 999998 │
│ 999997 │
└────────┘

project(expr)

Apply the given expression to each row in the relation.

import duckdb

rel = duckdb.sql("SELECT * FROM range(1_000_000) tbl(id)")
rel.project("id + 10 AS id_plus_ten").limit(3).show()
┌─────────────┐
│ id_plus_ten │
│    int64    │
├─────────────┤
│          10 │
│          11 │
│          12 │
└─────────────┘

union(rel)

Combine two relations, returning all rows in r1 followed by all rows in r2. The relations must have the same number of columns.

import duckdb

r1 = duckdb.sql("SELECT * FROM range(5) tbl(id)")
r2 = duckdb.sql("SELECT * FROM range(10, 15) tbl(id)")
r1.union(r2).show()
┌───────┐
│  id   │
│ int64 │
├───────┤
│     0 │
│     1 │
│     2 │
│     3 │
│     4 │
│    10 │
│    11 │
│    12 │
│    13 │
│    14 │
└───────┘

Result Output

The result of relations can be converted to various types of Python structures, see the [result conversion page]({% link docs/clients/python/conversion.md %}) for more information.

The result of relations can also be directly written to files using the below methods.

  • [write_csv]({% link docs/clients/python/reference/index.md %}#duckdb.DuckDBPyRelation.write_csv)
  • [write_parquet]({% link docs/clients/python/reference/index.md %}#duckdb.DuckDBPyRelation.write_parquet)

layout: docu title: Conversion between DuckDB and Python redirect_from:

  • /docs/api/python/conversion
  • /docs/api/python/conversion/
  • /docs/api/python/result_conversion
  • /docs/api/python/result_conversion/

This page documents the rules for converting Python objects to DuckDB and DuckDB results to Python.

Object Conversion: Python Object to DuckDB

This is a mapping of Python object types to DuckDB [Logical Types]({% link docs/sql/data_types/overview.md %}):

  • NoneNULL
  • boolBOOLEAN
  • datetime.timedeltaINTERVAL
  • strVARCHAR
  • bytearrayBLOB
  • memoryviewBLOB
  • decimal.DecimalDECIMAL / DOUBLE
  • uuid.UUIDUUID

The rest of the conversion rules are as follows.

int

Since integers can be of arbitrary size in Python, there is not a one-to-one conversion possible for ints. Instead we perform these casts in order until one succeeds:

  • BIGINT
  • INTEGER
  • UBIGINT
  • UINTEGER
  • DOUBLE

When using the DuckDB Value class, it's possible to set a target type, which will influence the conversion.

float

These casts are tried in order until one succeeds:

  • DOUBLE
  • FLOAT

datetime.datetime

For datetime we will check pandas.isnull if it's available and return NULL if it returns true. We check against datetime.datetime.min and datetime.datetime.max to convert to -inf and +inf respectively.

If the datetime has tzinfo, we will use TIMESTAMPTZ, otherwise it becomes TIMESTAMP.

datetime.time

If the time has tzinfo, we will use TIMETZ, otherwise it becomes TIME.

datetime.date

date converts to the DATE type. We check against datetime.date.min and datetime.date.max to convert to -inf and +inf respectively.

bytes

bytes converts to BLOB by default, when it's used to construct a Value object of type BITSTRING, it maps to BITSTRING instead.

list

list becomes a LIST type of the “most permissive” type of its children, for example:

my_list_value = [
    12345,
    "test"
]

Will become VARCHAR[] because 12345 can convert to VARCHAR but test can not convert to INTEGER.

[12345, test]

dict

The dict object can convert to either STRUCT(...) or MAP(..., ...) depending on its structure. If the dict has a structure similar to:

my_map_dict = {
    "key": [
        1, 2, 3
    ],
    "value": [
        "one", "two", "three"
    ]
}

Then we'll convert it to a MAP of key-value pairs of the two lists zipped together. The example above becomes a MAP(INTEGER, VARCHAR):

{1=one, 2=two, 3=three}

The names of the fields matter and the two lists need to have the same size.

Otherwise we'll try to convert it to a STRUCT.

my_struct_dict = {
    1: "one",
    "2": 2,
    "three": [1, 2, 3],
    False: True
}

Becomes:

{'1': one, '2': 2, 'three': [1, 2, 3], 'False': true}

Every key of the dictionary is converted to string.

tuple

tuple converts to LIST by default, when it's used to construct a Value object of type STRUCT it will convert to STRUCT instead.

numpy.ndarray and numpy.datetime64

ndarray and datetime64 are converted by calling tolist() and converting the result of that.

Result Conversion: DuckDB Results to Python

DuckDB's Python client provides multiple additional methods that can be used to efficiently retrieve data.

NumPy

  • fetchnumpy() fetches the data as a dictionary of NumPy arrays

Pandas

  • df() fetches the data as a Pandas DataFrame
  • fetchdf() is an alias of df()
  • fetch_df() is an alias of df()
  • fetch_df_chunk(vector_multiple) fetches a portion of the results into a DataFrame. The number of rows returned in each chunk is the vector size (2048 by default) * vector_multiple (1 by default).

Apache Arrow

  • arrow() fetches the data as an Arrow table
  • fetch_arrow_table() is an alias of arrow()
  • fetch_record_batch(chunk_size) returns an Arrow record batch reader with chunk_size rows per batch

Polars

  • pl() fetches the data as a Polars DataFrame

Examples

Below are some examples using this functionality. See the [Python guides]({% link docs/guides/overview.md %}#python-client) for more examples.

Fetch as Pandas DataFrame:

df = con.execute("SELECT * FROM items").fetchdf()
print(df)
       item   value  count
0     jeans    20.0      1
1    hammer    42.2      2
2    laptop  2000.0      1
3  chainsaw   500.0     10
4    iphone   300.0      2

Fetch as dictionary of NumPy arrays:

arr = con.execute("SELECT * FROM items").fetchnumpy()
print(arr)
{'item': masked_array(data=['jeans', 'hammer', 'laptop', 'chainsaw', 'iphone'],
             mask=[False, False, False, False, False],
       fill_value='?',
            dtype=object), 'value': masked_array(data=[20.0, 42.2, 2000.0, 500.0, 300.0],
             mask=[False, False, False, False, False],
       fill_value=1e+20), 'count': masked_array(data=[1, 2, 1, 10, 2],
             mask=[False, False, False, False, False],
       fill_value=999999,
            dtype=int32)}

Fetch as an Arrow table. Converting to Pandas afterwards just for pretty printing:

tbl = con.execute("SELECT * FROM items").fetch_arrow_table()
print(tbl.to_pandas())
       item    value  count
0     jeans    20.00      1
1    hammer    42.20      2
2    laptop  2000.00      1
3  chainsaw   500.00     10
4    iphone   300.00      2

layout: docu title: Data Ingestion redirect_from:

  • /docs/api/python/data_ingestion
  • /docs/api/python/data_ingestion/

This page contains examples for data ingestion to Python using DuckDB. First, import the DuckDB page:

import duckdb

Then, proceed with any of the following sections.

CSV Files

CSV files can be read using the read_csv function, called either from within Python or directly from within SQL. By default, the read_csv function attempts to auto-detect the CSV settings by sampling from the provided file.

Read from a file using fully auto-detected settings:

duckdb.read_csv("example.csv")

Read multiple CSV files from a folder:

duckdb.read_csv("folder/*.csv")

Specify options on how the CSV is formatted internally:

duckdb.read_csv("example.csv", header = False, sep = ",")

Override types of the first two columns:

duckdb.read_csv("example.csv", dtype = ["int", "varchar"])

Directly read a CSV file from within SQL:

duckdb.sql("SELECT * FROM 'example.csv'")

Call read_csv from within SQL:

duckdb.sql("SELECT * FROM read_csv('example.csv')")

See the [CSV Import]({% link docs/data/csv/overview.md %}) page for more information.

Parquet Files

Parquet files can be read using the read_parquet function, called either from within Python or directly from within SQL.

Read from a single Parquet file:

duckdb.read_parquet("example.parquet")

Read multiple Parquet files from a folder:

duckdb.read_parquet("folder/*.parquet")

Read a Parquet file over [https]({% link docs/extensions/httpfs/overview.md %}):

duckdb.read_parquet("https://some.url/some_file.parquet")

Read a list of Parquet files:

duckdb.read_parquet(["file1.parquet", "file2.parquet", "file3.parquet"])

Directly read a Parquet file from within SQL:

duckdb.sql("SELECT * FROM 'example.parquet'")

Call read_parquet from within SQL:

duckdb.sql("SELECT * FROM read_parquet('example.parquet')")

See the [Parquet Loading]({% link docs/data/parquet/overview.md %}) page for more information.

JSON Files

JSON files can be read using the read_json function, called either from within Python or directly from within SQL. By default, the read_json function will automatically detect if a file contains newline-delimited JSON or regular JSON, and will detect the schema of the objects stored within the JSON file.

Read from a single JSON file:

duckdb.read_json("example.json")

Read multiple JSON files from a folder:

duckdb.read_json("folder/*.json")

Directly read a JSON file from within SQL:

duckdb.sql("SELECT * FROM 'example.json'")

Call read_json from within SQL:

duckdb.sql("SELECT * FROM read_json_auto('example.json')")

Directly Accessing DataFrames and Arrow Objects

DuckDB is automatically able to query certain Python variables by referring to their variable name (as if it was a table). These types include the following: Pandas DataFrame, Polars DataFrame, Polars LazyFrame, NumPy arrays, [relations]({% link docs/clients/python/relational_api.md %}), and Arrow objects.

Only variables that are visible to Python code at the location of the sql() or execute() call can be used in this manner. Accessing these variables is made possible by [replacement scans]({% link docs/clients/c/replacement_scans.md %}). To disable replacement scans entirely, use:

SET python_enable_replacements = false;

DuckDB supports querying multiple types of Apache Arrow objects including tables, datasets, RecordBatchReaders, and scanners. See the Python [guides]({% link docs/guides/overview.md %}#python-client) for more examples.

import duckdb
import pandas as pd

test_df = pd.DataFrame.from_dict({"i": [1, 2, 3, 4], "j": ["one", "two", "three", "four"]})
print(duckdb.sql("SELECT * FROM test_df").fetchall())
[(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four')]

DuckDB also supports “registering” a DataFrame or Arrow object as a virtual table, comparable to a SQL VIEW. This is useful when querying a DataFrame/Arrow object that is stored in another way (as a class variable, or a value in a dictionary). Below is a Pandas example:

If your Pandas DataFrame is stored in another location, here is an example of manually registering it:

import duckdb
import pandas as pd

my_dictionary = {}
my_dictionary["test_df"] = pd.DataFrame.from_dict({"i": [1, 2, 3, 4], "j": ["one", "two", "three", "four"]})
duckdb.register("test_df_view", my_dictionary["test_df"])
print(duckdb.sql("SELECT * FROM test_df_view").fetchall())
[(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four')]

You can also create a persistent table in DuckDB from the contents of the DataFrame (or the view):

# create a new table from the contents of a DataFrame
con.execute("CREATE TABLE test_df_table AS SELECT * FROM test_df")
# insert into an existing table from the contents of a DataFrame
con.execute("INSERT INTO test_df_table SELECT * FROM test_df")

Pandas DataFrames – object Columns

pandas.DataFrame columns of an object dtype require some special care, since this stores values of arbitrary type. To convert these columns to DuckDB, we first go through an analyze phase before converting the values. In this analyze phase a sample of all the rows of the column are analyzed to determine the target type. This sample size is by default set to 1000. If the type picked during the analyze step is incorrect, this will result in a "Failed to cast value:" error, in which case you will need to increase the sample size. The sample size can be changed by setting the pandas_analyze_sample config option.

# example setting the sample size to 100k
duckdb.execute("SET GLOBAL pandas_analyze_sample = 100_000")

Registering Objects

You can register Python objects as DuckDB tables using the [DuckDBPyConnection.register() function]({% link docs/clients/python/reference/index.md %}#duckdb.DuckDBPyConnection.register).

The precedence of objects with the same name is as follows:

  • Objects explicitly registered via DuckDBPyConnection.register()
  • Native DuckDB tables and views
  • [Replacement scans]({% link docs/clients/c/replacement_scans.md %})

layout: docu title: Rust Client redirect_from:

  • /docs/api/rust
  • /docs/api/rust/

Installation

The DuckDB Rust client can be installed from crates.io. Please see the docs.rs for details.

Basic API Usage

duckdb-rs is an ergonomic wrapper based on the DuckDB C API, please refer to the README for details.

Startup & Shutdown

To use duckdb, you must first initialize a Connection handle using Connection::open(). Connection::open() takes as parameter the database file to read and write from. If the database file does not exist, it will be created (the file extension may be .db, .duckdb, or anything else). You can also use Connection::open_in_memory() to create an in-memory database. Note that for an in-memory database no data is persisted to disk (i.e., all data is lost when you exit the process).

use duckdb::{params, Connection, Result};
let conn = Connection::open_in_memory()?;

The Connection will automatically close the underlying db connection for you when it goes out of scope (via Drop). You can also explicitly close the Connection with conn.close(). This is not much difference between these in the typical case, but in case there is an error, you'll have the chance to handle it with the explicit close.

Querying

SQL queries can be sent to DuckDB using the execute() method of connections, or we can also prepare the statement and then query on that.

#[derive(Debug)]
struct Person {
    id: i32,
    name: String,
    data: Option<Vec<u8>>,
}

conn.execute(
    "INSERT INTO person (name, data) VALUES (?, ?)",
    params![me.name, me.data],
)?;

let mut stmt = conn.prepare("SELECT id, name, data FROM person")?;
let person_iter = stmt.query_map([], |row| {
    Ok(Person {
        id: row.get(0)?,
        name: row.get(1)?,
        data: row.get(2)?,
    })
})?;

for person in person_iter {
    println!("Found person {:?}", person.unwrap());
}

Appender

The Rust client supports the [DuckDB Appender API]({% link docs/data/appender.md %}) for bulk inserts. For example:

fn insert_rows(conn: &Connection) -> Result<()> {
    let mut app = conn.appender("foo")?;
    app.append_rows([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]])?;
    Ok(())
}

layout: docu title: DuckDB Wasm github_repository: https://github.com/duckdb/duckdb-wasm redirect_from:

  • /docs/api/wasm
  • /docs/api/wasm/
  • /docs/api/wasm/overview
  • /docs/api/wasm/overview/

DuckDB has been compiled to WebAssembly, so it can run inside any browser on any device.

{% include iframe.html src="https://shell.duckdb.org" %}

DuckDB-Wasm offers a layered API, it can be embedded as a JavaScript + WebAssembly library, as a Web shell, or built from source according to your needs.

Getting Started with DuckDB-Wasm

A great starting point is to read the [DuckDB-Wasm launch blog post]({% post_url 2021-10-29-duckdb-wasm %})!

Another great resource is the GitHub repository.

For details, see the full DuckDB-Wasm API Documentation.

Limitations


layout: docu title: Extensions redirect_from:

  • /docs/api/wasm/extensions
  • /docs/api/wasm/extensions/

DuckDB-Wasm's (dynamic) extension loading is modeled after the regular DuckDB's extension loading, with a few relevant differences due to the difference in platform.

Format

Extensions in DuckDB are binaries to be dynamically loaded via dlopen. A cryptographical signature is appended to the binary. An extension in DuckDB-Wasm is a regular Wasm file to be dynamically loaded via Emscripten's dlopen. A cryptographical signature is appended to the Wasm file as a WebAssembly custom section called duckdb_signature. This ensures the file remains a valid WebAssembly file.

Currently, we require this custom section to be the last one, but this can be potentially relaxed in the future.

INSTALL and LOAD

The INSTALL semantic in native embeddings of DuckDB is to fetch, decompress from gzip and store data in local disk. The LOAD semantic in native embeddings of DuckDB is to (optionally) perform signature checks and dynamic load the binary with the main DuckDB binary.

In DuckDB-Wasm, INSTALL is a no-op given there is no durable cross-session storage. The LOAD operation will fetch (and decompress on the fly), perform signature checks and dynamically load via the Emscripten implementation of dlopen.

Autoloading

[Autoloading]({% link docs/extensions/overview.md %}), i.e., the possibility for DuckDB to add extension functionality on-the-fly, is enabled by default in DuckDB-Wasm.

List of Officially Available Extensions

Extension name Description Aliases
[autocomplete]({% link docs/extensions/autocomplete.md %}) Adds support for autocomplete in the shell
[excel]({% link docs/extensions/excel.md %}) Adds support for Excel-like format strings
[fts]({% link docs/extensions/full_text_search.md %}) Adds support for Full-Text Search Indexes
[icu]({% link docs/extensions/icu.md %}) Adds support for time zones and collations using the ICU library
[inet]({% link docs/extensions/inet.md %}) Adds support for IP-related data types and functions
[json]({% link docs/data/json/overview.md %}) Adds support for JSON operations
[parquet]({% link docs/data/parquet/overview.md %}) Adds support for reading and writing Parquet files
[sqlite]({% link docs/extensions/sqlite.md %}) Adds support for reading SQLite database files sqlite, sqlite3
[sqlsmith]({% link docs/extensions/sqlsmith.md %})
[tpcds]({% link docs/extensions/tpcds.md %}) Adds TPC-DS data generation and query support
[tpch]({% link docs/extensions/tpch.md %}) Adds TPC-H data generation and query support

WebAssembly is basically an additional platform, and there might be platform-specific limitations that make some extensions not able to match their native capabilities or to perform them in a different way. We will document here relevant differences for DuckDB-hosted extensions.

HTTPFS

The HTTPFS extension is, at the moment, not available in DuckDB-Wasm. Https protocol capabilities needs to go through an additional layer, the browser, which adds both differences and some restrictions to what is doable from native.

Instead, DuckDB-Wasm has a separate implementation that for most purposes is interchangeable, but does not support all use cases (as it must follow security rules imposed by the browser, such as CORS). Due to this CORS restriction, any requests for data made using the HTTPFS extension must be to websites that allow (using CORS headers) the website hosting the DuckDB-Wasm instance to access that data. The MDN website is a great resource for more information regarding CORS.

Extension Signing

As with regular DuckDB extensions, DuckDB-Wasm extension are by default checked on LOAD to verify the signature confirm the extension has not been tampered with. Extension signature verification can be disabled via a configuration option. Signing is a property of the binary itself, so copying a DuckDB extension (say to serve it from a different location) will still keep a valid signature (e.g., for local development).

Fetching DuckDB-Wasm Extensions

Official DuckDB extensions are served at extensions.duckdb.org, and this is also the default value for the default_extension_repository option. When installing extensions, a relevant URL will be built that will look like extensions.duckdb.org/$duckdb_version_hash/$duckdb_platform/$name.duckdb_extension.gz.

DuckDB-Wasm extension are fetched only on load, and the URL will look like: extensions.duckdb.org/duckdb-wasm/$duckdb_version_hash/$duckdb_platform/$name.duckdb_extension.wasm.

Note that an additional duckdb-wasm is added to the folder structure, and the file is served as a .wasm file.

DuckDB-Wasm extensions are served pre-compressed using Brotli compression. While fetched from a browser, extensions will be transparently uncompressed. If you want to fetch the duckdb-wasm extension manually, you can use curl --compress extensions.duckdb.org/<...>/icu.duckdb_extension.wasm.

Serving Extensions from a Third-Party Repository

As with regular DuckDB, if you use SET custom_extension_repository = some.url.com, subsequent loads will be attempted at some.url.com/duckdb-wasm/$duckdb_version_hash/$duckdb_platform/$name.duckdb_extension.wasm.

Note that GET requests on the extensions needs to be CORS enabled for a browser to allow the connection.

Tooling

Both DuckDB-Wasm and its extensions have been compiled using latest packaged Emscripten toolchain.

{% include iframe.html src="https://shell.duckdb.org" %}

layout: docu title: ADBC Client redirect_from:

  • /docs/api/adbc
  • /docs/api/adbc/

Arrow Database Connectivity (ADBC), similarly to ODBC and JDBC, is a C-style API that enables code portability between different database systems. This allows developers to effortlessly build applications that communicate with database systems without using code specific to that system. The main difference between ADBC and ODBC/JDBC is that ADBC uses Arrow to transfer data between the database system and the application. DuckDB has an ADBC driver, which takes advantage of the [zero-copy integration between DuckDB and Arrow]({% post_url 2021-12-03-duck-arrow %}) to efficiently transfer data.

DuckDB's ADBC driver currently supports version 0.7 of ADBC.

Please refer to the ADBC documentation page for a more extensive discussion on ADBC and a detailed API explanation.

Implemented Functionality

The DuckDB-ADBC driver implements the full ADBC specification, with the exception of the ConnectionReadPartition and StatementExecutePartitions functions. Both of these functions exist to support systems that internally partition the query results, which does not apply to DuckDB. In this section, we will describe the main functions that exist in ADBC, along with the arguments they take and provide examples for each function.

Database

Set of functions that operate on a database.

Function name Description Arguments Example
DatabaseNew Allocate a new (but uninitialized) database. (AdbcDatabase *database, AdbcError *error) AdbcDatabaseNew(&adbc_database, &adbc_error)
DatabaseSetOption Set a char* option. (AdbcDatabase *database, const char *key, const char *value, AdbcError *error) AdbcDatabaseSetOption(&adbc_database, "path", "test.db", &adbc_error)
DatabaseInit Finish setting options and initialize the database. (AdbcDatabase *database, AdbcError *error) AdbcDatabaseInit(&adbc_database, &adbc_error)
DatabaseRelease Destroy the database. (AdbcDatabase *database, AdbcError *error) AdbcDatabaseRelease(&adbc_database, &adbc_error)

Connection

A set of functions that create and destroy a connection to interact with a database.

Function name Description Arguments Example
ConnectionNew Allocate a new (but uninitialized) connection. (AdbcConnection*, AdbcError*) AdbcConnectionNew(&adbc_connection, &adbc_error)
ConnectionSetOption Options may be set before ConnectionInit. (AdbcConnection*, const char*, const char*, AdbcError*) AdbcConnectionSetOption(&adbc_connection, ADBC_CONNECTION_OPTION_AUTOCOMMIT, ADBC_OPTION_VALUE_DISABLED, &adbc_error)
ConnectionInit Finish setting options and initialize the connection. (AdbcConnection*, AdbcDatabase*, AdbcError*) AdbcConnectionInit(&adbc_connection, &adbc_database, &adbc_error)
ConnectionRelease Destroy this connection. (AdbcConnection*, AdbcError*) AdbcConnectionRelease(&adbc_connection, &adbc_error)

A set of functions that retrieve metadata about the database. In general, these functions will return Arrow objects, specifically an ArrowArrayStream.

Function name Description Arguments Example
ConnectionGetObjects Get a hierarchical view of all catalogs, database schemas, tables, and columns. (AdbcConnection*, int, const char*, const char*, const char*, const char**, const char*, ArrowArrayStream*, AdbcError*) AdbcDatabaseInit(&adbc_database, &adbc_error)
ConnectionGetTableSchema Get the Arrow schema of a table. (AdbcConnection*, const char*, const char*, const char*, ArrowSchema*, AdbcError*) AdbcDatabaseRelease(&adbc_database, &adbc_error)
ConnectionGetTableTypes Get a list of table types in the database. (AdbcConnection*, ArrowArrayStream*, AdbcError*) AdbcDatabaseNew(&adbc_database, &adbc_error)

A set of functions with transaction semantics for the connection. By default, all connections start with auto-commit mode on, but this can be turned off via the ConnectionSetOption function.

Function name Description Arguments Example
ConnectionCommit Commit any pending transactions. (AdbcConnection*, AdbcError*) AdbcConnectionCommit(&adbc_connection, &adbc_error)
ConnectionRollback Rollback any pending transactions. (AdbcConnection*, AdbcError*) AdbcConnectionRollback(&adbc_connection, &adbc_error)

Statement

Statements hold state related to query execution. They represent both one-off queries and prepared statements. They can be reused; however, doing so will invalidate prior result sets from that statement.

The functions used to create, destroy, and set options for a statement:

Function name Description Arguments Example
StatementNew Create a new statement for a given connection. (AdbcConnection*, AdbcStatement*, AdbcError*) AdbcStatementNew(&adbc_connection, &adbc_statement, &adbc_error)
StatementRelease Destroy a statement. (AdbcStatement*, AdbcError*) AdbcStatementRelease(&adbc_statement, &adbc_error)
StatementSetOption Set a string option on a statement. (AdbcStatement*, const char*, const char*, AdbcError*) StatementSetOption(&adbc_statement, ADBC_INGEST_OPTION_TARGET_TABLE, "TABLE_NAME", &adbc_error)

Functions related to query execution:

Function name Description Arguments Example
StatementSetSqlQuery Set the SQL query to execute. The query can then be executed with StatementExecuteQuery. (AdbcStatement*, const char*, AdbcError*) AdbcStatementSetSqlQuery(&adbc_statement, "SELECT * FROM TABLE", &adbc_error)
StatementSetSubstraitPlan Set a substrait plan to execute. The query can then be executed with StatementExecuteQuery. (AdbcStatement*, const uint8_t*, size_t, AdbcError*) AdbcStatementSetSubstraitPlan(&adbc_statement, substrait_plan, length, &adbc_error)
StatementExecuteQuery Execute a statement and get the results. (AdbcStatement*, ArrowArrayStream*, int64_t*, AdbcError*) AdbcStatementExecuteQuery(&adbc_statement, &arrow_stream, &rows_affected, &adbc_error)
StatementPrepare Turn this statement into a prepared statement to be executed multiple times. (AdbcStatement*, AdbcError*) AdbcStatementPrepare(&adbc_statement, &adbc_error)

Functions related to binding, used for bulk insertion or in prepared statements.

Function name Description Arguments Example
StatementBindStream Bind Arrow Stream. This can be used for bulk inserts or prepared statements. (AdbcStatement*, ArrowArrayStream*, AdbcError*) StatementBindStream(&adbc_statement, &input_data, &adbc_error)

Examples

Regardless of the programming language being used, there are two database options which will be required to utilize ADBC with DuckDB. The first one is the driver, which takes a path to the DuckDB library. The second option is the entrypoint, which is an exported function from the DuckDB-ADBC driver that initializes all the ADBC functions. Once we have configured these two options, we can optionally set the path option, providing a path on disk to store our DuckDB database. If not set, an in-memory database is created. After configuring all the necessary options, we can proceed to initialize our database. Below is how you can do so with various different language environments.

C++

We begin our C++ example by declaring the essential variables for querying data through ADBC. These variables include Error, Database, Connection, Statement handling, and an Arrow Stream to transfer data between DuckDB and the application.

AdbcError adbc_error;
AdbcDatabase adbc_database;
AdbcConnection adbc_connection;
AdbcStatement adbc_statement;
ArrowArrayStream arrow_stream;

We can then initialize our database variable. Before initializing the database, we need to set the driver and entrypoint options as mentioned above. Then we set the path option and initialize the database. With the example below, the string "path/to/libduckdb.dylib" should be the path to the dynamic library for DuckDB. This will be .dylib on macOS, and .so on Linux.

AdbcDatabaseNew(&adbc_database, &adbc_error);
AdbcDatabaseSetOption(&adbc_database, "driver", "path/to/libduckdb.dylib", &adbc_error);
AdbcDatabaseSetOption(&adbc_database, "entrypoint", "duckdb_adbc_init", &adbc_error);
// By default, we start an in-memory database, but you can optionally define a path to store it on disk.
AdbcDatabaseSetOption(&adbc_database, "path", "test.db", &adbc_error);
AdbcDatabaseInit(&adbc_database, &adbc_error);

After initializing the database, we must create and initialize a connection to it.

AdbcConnectionNew(&adbc_connection, &adbc_error);
AdbcConnectionInit(&adbc_connection, &adbc_database, &adbc_error);

We can now initialize our statement and run queries through our connection. After the AdbcStatementExecuteQuery the arrow_stream is populated with the result.

AdbcStatementNew(&adbc_connection, &adbc_statement, &adbc_error);
AdbcStatementSetSqlQuery(&adbc_statement, "SELECT 42", &adbc_error);
int64_t rows_affected;
AdbcStatementExecuteQuery(&adbc_statement, &arrow_stream, &rows_affected, &adbc_error);
arrow_stream.release(arrow_stream)

Besides running queries, we can also ingest data via arrow_streams. For this we need to set an option with the table name we want to insert to, bind the stream and then execute the query.

StatementSetOption(&adbc_statement, ADBC_INGEST_OPTION_TARGET_TABLE, "AnswerToEverything", &adbc_error);
StatementBindStream(&adbc_statement, &arrow_stream, &adbc_error);
StatementExecuteQuery(&adbc_statement, nullptr, nullptr, &adbc_error);

Python

The first thing to do is to use pip and install the ADBC Driver manager. You will also need to install the pyarrow to directly access Apache Arrow formatted result sets (such as using fetch_arrow_table).

pip install adbc_driver_manager pyarrow

For details on the adbc_driver_manager package, see the adbc_driver_manager package documentation.

As with C++, we need to provide initialization options consisting of the location of the libduckdb shared object and entrypoint function. Notice that the path argument for DuckDB is passed in through the db_kwargs dictionary.

import adbc_driver_duckdb.dbapi

with adbc_driver_duckdb.dbapi.connect("test.db") as conn, conn.cursor() as cur:
    cur.execute("SELECT 42")
    # fetch a pyarrow table
    tbl = cur.fetch_arrow_table()
    print(tbl)

Alongside fetch_arrow_table, other methods from DBApi are also implemented on the cursor, such as fetchone and fetchall. Data can also be ingested via arrow_streams. We just need to set options on the statement to bind the stream of data and execute the query.

import adbc_driver_duckdb.dbapi
import pyarrow

data = pyarrow.record_batch(
    [[1, 2, 3, 4], ["a", "b", "c", "d"]],
    names = ["ints", "strs"],
)

with adbc_driver_duckdb.dbapi.connect("test.db") as conn, conn.cursor() as cur:
    cur.adbc_ingest("AnswerToEverything", data)

Go

Make sure to download the libduckdb library first (i.e., the .so on Linux, .dylib on Mac or .dll on Windows) from the releases page, and put it on your LD_LIBRARY_PATH before you run the code (but if you don't, the error will explain your options regarding the location of this file.)

The following example uses an in-memory DuckDB database to modify in-memory Arrow RecordBatches via SQL queries:

{% raw %}

package main

import (
    "bytes"
    "context"
    "fmt"
    "io"

    "github.com/apache/arrow-adbc/go/adbc"
    "github.com/apache/arrow-adbc/go/adbc/drivermgr"
    "github.com/apache/arrow/go/v17/arrow"
    "github.com/apache/arrow/go/v17/arrow/array"
    "github.com/apache/arrow/go/v17/arrow/ipc"
    "github.com/apache/arrow/go/v17/arrow/memory"
)

func _makeSampleArrowRecord() arrow.Record {
    b := array.NewFloat64Builder(memory.DefaultAllocator)
    b.AppendValues([]float64{1, 2, 3}, nil)
    col := b.NewArray()

    defer col.Release()
    defer b.Release()

    schema := arrow.NewSchema([]arrow.Field{{Name: "column1", Type: arrow.PrimitiveTypes.Float64}}, nil)
    return array.NewRecord(schema, []arrow.Array{col}, int64(col.Len()))
}

type DuckDBSQLRunner struct {
    ctx  context.Context
    conn adbc.Connection
    db   adbc.Database
}

func NewDuckDBSQLRunner(ctx context.Context) (*DuckDBSQLRunner, error) {
    var drv drivermgr.Driver
    db, err := drv.NewDatabase(map[string]string{
        "driver":     "duckdb",
        "entrypoint": "duckdb_adbc_init",
        "path":       ":memory:",
    })
    if err != nil {
        return nil, fmt.Errorf("failed to create new in-memory DuckDB database: %w", err)
    }
    conn, err := db.Open(ctx)
    if err != nil {
        return nil, fmt.Errorf("failed to open connection to new in-memory DuckDB database: %w", err)
    }
    return &DuckDBSQLRunner{ctx: ctx, conn: conn, db: db}, nil
}

func serializeRecord(record arrow.Record) (io.Reader, error) {
    buf := new(bytes.Buffer)
    wr := ipc.NewWriter(buf, ipc.WithSchema(record.Schema()))
    if err := wr.Write(record); err != nil {
        return nil, fmt.Errorf("failed to write record: %w", err)
    }
    if err := wr.Close(); err != nil {
        return nil, fmt.Errorf("failed to close writer: %w", err)
    }
    return buf, nil
}

func (r *DuckDBSQLRunner) importRecord(sr io.Reader) error {
    rdr, err := ipc.NewReader(sr)
    if err != nil {
        return fmt.Errorf("failed to create IPC reader: %w", err)
    }
    defer rdr.Release()
    stmt, err := r.conn.NewStatement()
    if err != nil {
        return fmt.Errorf("failed to create new statement: %w", err)
    }
    if err := stmt.SetOption(adbc.OptionKeyIngestMode, adbc.OptionValueIngestModeCreate); err != nil {
        return fmt.Errorf("failed to set ingest mode: %w", err)
    }
    if err := stmt.SetOption(adbc.OptionKeyIngestTargetTable, "temp_table"); err != nil {
        return fmt.Errorf("failed to set ingest target table: %w", err)
    }
    if err := stmt.BindStream(r.ctx, rdr); err != nil {
        return fmt.Errorf("failed to bind stream: %w", err)
    }
    if _, err := stmt.ExecuteUpdate(r.ctx); err != nil {
        return fmt.Errorf("failed to execute update: %w", err)
    }
    return stmt.Close()
}

func (r *DuckDBSQLRunner) runSQL(sql string) ([]arrow.Record, error) {
    stmt, err := r.conn.NewStatement()
    if err != nil {
        return nil, fmt.Errorf("failed to create new statement: %w", err)
    }
    defer stmt.Close()

    if err := stmt.SetSqlQuery(sql); err != nil {
        return nil, fmt.Errorf("failed to set SQL query: %w", err)
    }
    out, n, err := stmt.ExecuteQuery(r.ctx)
    if err != nil {
        return nil, fmt.Errorf("failed to execute query: %w", err)
    }
    defer out.Release()

    result := make([]arrow.Record, 0, n)
    for out.Next() {
        rec := out.Record()
        rec.Retain() // .Next() will release the record, so we need to retain it
        result = append(result, rec)
    }
    if out.Err() != nil {
        return nil, out.Err()
    }
    return result, nil
}

func (r *DuckDBSQLRunner) RunSQLOnRecord(record arrow.Record, sql string) ([]arrow.Record, error) {
    serializedRecord, err := serializeRecord(record)
    if err != nil {
        return nil, fmt.Errorf("failed to serialize record: %w", err)
    }
    if err := r.importRecord(serializedRecord); err != nil {
        return nil, fmt.Errorf("failed to import record: %w", err)
    }
    result, err := r.runSQL(sql)
    if err != nil {
        return nil, fmt.Errorf("failed to run SQL: %w", err)
    }

    if _, err := r.runSQL("DROP TABLE temp_table"); err != nil {
        return nil, fmt.Errorf("failed to drop temp table after running query: %w", err)
    }
    return result, nil
}

func (r *DuckDBSQLRunner) Close() {
    r.conn.Close()
    r.db.Close()
}

func main() {
    rec := _makeSampleArrowRecord()
    fmt.Println(rec)

    runner, err := NewDuckDBSQLRunner(context.Background())
    if err != nil {
        panic(err)
    }
    defer runner.Close()

    resultRecords, err := runner.RunSQLOnRecord(rec, "SELECT column1+1 FROM temp_table")
    if err != nil {
        panic(err)
    }

    for _, resultRecord := range resultRecords {
        fmt.Println(resultRecord)
        resultRecord.Release()
    }
}

{% endraw %}

Running it produces the following output:

record:
  schema:
  fields: 1
    - column1: type=float64
  rows: 3
  col[0][column1]: [1 2 3]

record:
  schema:
  fields: 1
    - (column1 + 1): type=float64, nullable
  rows: 3
  col[0][(column1 + 1)]: [2 3 4]

layout: docu title: Instantiation redirect_from:

  • /docs/api/wasm/instantiation
  • /docs/api/wasm/instantiation/

DuckDB-Wasm has multiple ways to be instantiated depending on the use case.

cdn(jsdelivr)

import * as duckdb from '@duckdb/duckdb-wasm';

const JSDELIVR_BUNDLES = duckdb.getJsDelivrBundles();

// Select a bundle based on browser checks
const bundle = await duckdb.selectBundle(JSDELIVR_BUNDLES);

const worker_url = URL.createObjectURL(
  new Blob([`importScripts("${bundle.mainWorker!}");`], {type: 'text/javascript'})
);

// Instantiate the asynchronus version of DuckDB-Wasm
const worker = new Worker(worker_url);
const logger = new duckdb.ConsoleLogger();
const db = new duckdb.AsyncDuckDB(logger, worker);
await db.instantiate(bundle.mainModule, bundle.pthreadWorker);
URL.revokeObjectURL(worker_url);

webpack

import * as duckdb from '@duckdb/duckdb-wasm';
import duckdb_wasm from '@duckdb/duckdb-wasm/dist/duckdb-mvp.wasm';
import duckdb_wasm_next from '@duckdb/duckdb-wasm/dist/duckdb-eh.wasm';
const MANUAL_BUNDLES: duckdb.DuckDBBundles = {
    mvp: {
        mainModule: duckdb_wasm,
        mainWorker: new URL('@duckdb/duckdb-wasm/dist/duckdb-browser-mvp.worker.js', import.meta.url).toString(),
    },
    eh: {
        mainModule: duckdb_wasm_next,
        mainWorker: new URL('@duckdb/duckdb-wasm/dist/duckdb-browser-eh.worker.js', import.meta.url).toString(),
    },
};
// Select a bundle based on browser checks
const bundle = await duckdb.selectBundle(MANUAL_BUNDLES);
// Instantiate the asynchronus version of DuckDB-Wasm
const worker = new Worker(bundle.mainWorker!);
const logger = new duckdb.ConsoleLogger();
const db = new duckdb.AsyncDuckDB(logger, worker);
await db.instantiate(bundle.mainModule, bundle.pthreadWorker);

vite

import * as duckdb from '@duckdb/duckdb-wasm';
import duckdb_wasm from '@duckdb/duckdb-wasm/dist/duckdb-mvp.wasm?url';
import mvp_worker from '@duckdb/duckdb-wasm/dist/duckdb-browser-mvp.worker.js?url';
import duckdb_wasm_eh from '@duckdb/duckdb-wasm/dist/duckdb-eh.wasm?url';
import eh_worker from '@duckdb/duckdb-wasm/dist/duckdb-browser-eh.worker.js?url';

const MANUAL_BUNDLES: duckdb.DuckDBBundles = {
    mvp: {
        mainModule: duckdb_wasm,
        mainWorker: mvp_worker,
    },
    eh: {
        mainModule: duckdb_wasm_eh,
        mainWorker: eh_worker,
    },
};
// Select a bundle based on browser checks
const bundle = await duckdb.selectBundle(MANUAL_BUNDLES);
// Instantiate the asynchronus version of DuckDB-wasm
const worker = new Worker(bundle.mainWorker!);
const logger = new duckdb.ConsoleLogger();
const db = new duckdb.AsyncDuckDB(logger, worker);
await db.instantiate(bundle.mainModule, bundle.pthreadWorker);

Statically Served

It is possible to manually download the files from https://cdn.jsdelivr.net/npm/@duckdb/duckdb-wasm/dist/.

import * as duckdb from '@duckdb/duckdb-wasm';

const MANUAL_BUNDLES: duckdb.DuckDBBundles = {
    mvp: {
        mainModule: 'change/me/../duckdb-mvp.wasm',
        mainWorker: 'change/me/../duckdb-browser-mvp.worker.js',
    },
    eh: {
        mainModule: 'change/m/../duckdb-eh.wasm',
        mainWorker: 'change/m/../duckdb-browser-eh.worker.js',
    },
};
// Select a bundle based on browser checks
const bundle = await duckdb.selectBundle(MANUAL_BUNDLES);
// Instantiate the asynchronous version of DuckDB-Wasm
const worker = new Worker(bundle.mainWorker!);
const logger = new duckdb.ConsoleLogger();
const db = new duckdb.AsyncDuckDB(logger, worker);
await db.instantiate(bundle.mainModule, bundle.pthreadWorker);

layout: docu title: Data Ingestion redirect_from:

  • /docs/api/wasm/data_ingestion
  • /docs/api/wasm/data_ingestion/

DuckDB-Wasm has multiple ways to import data, depending on the format of the data.

There are two steps to import data into DuckDB.

First, the data file is imported into a local file system using register functions (registerEmptyFileBuffer, registerFileBuffer, registerFileHandle, registerFileText, registerFileURL).

Then, the data file is imported into DuckDB using insert functions (insertArrowFromIPCStream, insertArrowTable, insertCSVFromPath, insertJSONFromPath) or directly using FROM SQL query (using extensions like Parquet or Wasm-flavored httpfs).

[Insert statements]({% link docs/data/insert.md %}) can also be used to import data.

Data Import

Open & Close Connection

// Create a new connection
const c = await db.connect();

// ... import data

// Close the connection to release memory
await c.close();

Apache Arrow

// Data can be inserted from an existing arrow.Table
// More Example https://arrow.apache.org/docs/js/
import { tableFromArrays } from 'apache-arrow';

// EOS signal according to Arrorw IPC streaming format
// See https://arrow.apache.org/docs/format/Columnar.html#ipc-streaming-format
const EOS = new Uint8Array([255, 255, 255, 255, 0, 0, 0, 0]);

const arrowTable = tableFromArrays({
  id: [1, 2, 3],
  name: ['John', 'Jane', 'Jack'],
  age: [20, 21, 22],
});

await c.insertArrowTable(arrowTable, { name: 'arrow_table' });
// Write EOS
await c.insertArrowTable(EOS, { name: 'arrow_table' });

// ..., from a raw Arrow IPC stream
const streamResponse = await fetch(`someapi`);
const streamReader = streamResponse.body.getReader();
const streamInserts = [];
while (true) {
    const { value, done } = await streamReader.read();
    if (done) break;
    streamInserts.push(c.insertArrowFromIPCStream(value, { name: 'streamed' }));
}

// Write EOS
streamInserts.push(c.insertArrowFromIPCStream(EOS, { name: 'streamed' }));

await Promise.all(streamInserts);

CSV

// ..., from CSV files
// (interchangeable: registerFile{Text,Buffer,URL,Handle})
const csvContent = '1|foo\n2|bar\n';
await db.registerFileText(`data.csv`, csvContent);
// ... with typed insert options
await c.insertCSVFromPath('data.csv', {
    schema: 'main',
    name: 'foo',
    detect: false,
    header: false,
    delimiter: '|',
    columns: {
        col1: new arrow.Int32(),
        col2: new arrow.Utf8(),
    },
});

JSON

// ..., from JSON documents in row-major format
const jsonRowContent = [
    { "col1": 1, "col2": "foo" },
    { "col1": 2, "col2": "bar" },
];
await db.registerFileText(
    'rows.json',
    JSON.stringify(jsonRowContent),
);
await c.insertJSONFromPath('rows.json', { name: 'rows' });

// ... or column-major format
const jsonColContent = {
    "col1": [1, 2],
    "col2": ["foo", "bar"]
};
await db.registerFileText(
    'columns.json',
    JSON.stringify(jsonColContent),
);
await c.insertJSONFromPath('columns.json', { name: 'columns' });

// From API
const streamResponse = await fetch(`someapi/content.json`);
await db.registerFileBuffer('file.json', new Uint8Array(await streamResponse.arrayBuffer()))
await c.insertJSONFromPath('file.json', { name: 'JSONContent' });

Parquet

// from Parquet files
// ...Local
const pickedFile: File = letUserPickFile();
await db.registerFileHandle('local.parquet', pickedFile, DuckDBDataProtocol.BROWSER_FILEREADER, true);
// ...Remote
await db.registerFileURL('remote.parquet', 'https://origin/remote.parquet', DuckDBDataProtocol.HTTP, false);
// ... Using Fetch
const res = await fetch('https://origin/remote.parquet');
await db.registerFileBuffer('buffer.parquet', new Uint8Array(await res.arrayBuffer()));

// ..., by specifying URLs in the SQL text
await c.query(`
    CREATE TABLE direct AS
        SELECT * FROM 'https://origin/remote.parquet'
`);
// ..., or by executing raw insert statements
await c.query(`
    INSERT INTO existing_table
    VALUES (1, 'foo'), (2, 'bar')`);

httpfs (Wasm-flavored)

// ..., by specifying URLs in the SQL text
await c.query(`
    CREATE TABLE direct AS
        SELECT * FROM 'https://origin/remote.parquet'
`);

Tip If you encounter a Network Error (Failed to execute 'send' on 'XMLHttpRequest') when you try to query files from S3, configure the S3 permission CORS header. For example:

[
    {
        "AllowedHeaders": [
            "*"
        ],
        "AllowedMethods": [
            "GET",
            "HEAD"
        ],
        "AllowedOrigins": [
            "*"
        ],
        "ExposeHeaders": [],
        "MaxAgeSeconds": 3000
    }
]

Insert Statement

// ..., or by executing raw insert statements
await c.query(`
    INSERT INTO existing_table
    VALUES (1, 'foo'), (2, 'bar')`);

layout: docu title: Query redirect_from:

  • /docs/api/wasm/query
  • /docs/api/wasm/query/

DuckDB-Wasm provides functions for querying data. Queries are run sequentially.

First, a connection need to be created by calling connect. Then, queries can be run by calling query or send.

Query Execution

// Create a new connection
const conn = await db.connect();

// Either materialize the query result
await conn.query<{ v: arrow.Int }>(`
    SELECT * FROM generate_series(1, 100) t(v)
`);
// ..., or fetch the result chunks lazily
for await (const batch of await conn.send<{ v: arrow.Int }>(`
    SELECT * FROM generate_series(1, 100) t(v)
`)) {
    // ...
}

// Close the connection to release memory
await conn.close();

Prepared Statements

// Create a new connection
const conn = await db.connect();
// Prepare query
const stmt = await conn.prepare(`SELECT v + ? FROM generate_series(0, 10_000) t(v);`);
// ... and run the query with materialized results
await stmt.query(234);
// ... or result chunks
for await (const batch of await stmt.send(234)) {
    // ...
}
// Close the statement to release memory
await stmt.close();
// Closing the connection will release statements as well
await conn.close();

Arrow Table to JSON

// Create a new connection
const conn = await db.connect();

// Query
const arrowResult = await conn.query<{ v: arrow.Int }>(`
    SELECT * FROM generate_series(1, 100) t(v)
`);

// Convert arrow table to json
const result = arrowResult.toArray().map((row) => row.toJSON());

// Close the connection to release memory
await conn.close();

Export Parquet

// Create a new connection
const conn = await db.connect();

// Export Parquet
conn.send(`COPY (SELECT * FROM tbl) TO 'result-snappy.parquet' (FORMAT PARQUET);`);
const parquet_buffer = await this._db.copyFileToBuffer('result-snappy.parquet');

// Generate a download link
const link = URL.createObjectURL(new Blob([parquet_buffer]));

// Close the connection to release memory
await conn.close();

layout: docu title: Julia Client redirect_from:

  • /docs/api/julia
  • /docs/api/julia/

The DuckDB Julia package provides a high-performance front-end for DuckDB. Much like SQLite, DuckDB runs in-process within the Julia client, and provides a DBInterface front-end.

The package also supports multi-threaded execution. It uses Julia threads/tasks for this purpose. If you wish to run queries in parallel, you must launch Julia with multi-threading support (by e.g., setting the JULIA_NUM_THREADS environment variable).

Installation

Install DuckDB as follows:

using Pkg
Pkg.add("DuckDB")

Alternatively, enter the package manager using the ] key, and issue the following command:

pkg> add DuckDB

Basics

using DuckDB

# create a new in-memory database
con = DBInterface.connect(DuckDB.DB, ":memory:")

# create a table
DBInterface.execute(con, "CREATE TABLE integers (i INTEGER)")

# insert data by executing a prepared statement
stmt = DBInterface.prepare(con, "INSERT INTO integers VALUES(?)")
DBInterface.execute(stmt, [42])

# query the database
results = DBInterface.execute(con, "SELECT 42 a")
print(results)

Some SQL statements, such as PIVOT and IMPORT DATABASE are executed as multiple prepared statements and will error when using DuckDB.execute(). Instead they can be run with DuckDB.query() instead of DuckDB.execute() and will always return a materialized result.

Scanning DataFrames

The DuckDB Julia package also provides support for querying Julia DataFrames. Note that the DataFrames are directly read by DuckDB – they are not inserted or copied into the database itself.

If you wish to load data from a DataFrame into a DuckDB table you can run a CREATE TABLE ... AS or INSERT INTO query.

using DuckDB
using DataFrames

# create a new in-memory dabase
con = DBInterface.connect(DuckDB.DB)

# create a DataFrame
df = DataFrame(a = [1, 2, 3], b = [42, 84, 42])

# register it as a view in the database
DuckDB.register_data_frame(con, df, "my_df")

# run a SQL query over the DataFrame
results = DBInterface.execute(con, "SELECT * FROM my_df")
print(results)

Appender API

The DuckDB Julia package also supports the [Appender API]({% link docs/data/appender.md %}), which is much faster than using prepared statements or individual INSERT INTO statements. Appends are made in row-wise format. For every column, an append() call should be made, after which the row should be finished by calling flush(). After all rows have been appended, close() should be used to finalize the Appender and clean up the resulting memory.

using DuckDB, DataFrames, Dates
db = DuckDB.DB()
# create a table
DBInterface.execute(db,
    "CREATE OR REPLACE TABLE data(id INTEGER PRIMARY KEY, value FLOAT, timestamp TIMESTAMP, date DATE)")
# create data to insert
len = 100
df = DataFrames.DataFrame(
        id = collect(1:len),
        value = rand(len),
        timestamp = Dates.now() + Dates.Second.(1:len),
        date = Dates.today() + Dates.Day.(1:len)
    )
# append data by row
appender = DuckDB.Appender(db, "data")
for i in eachrow(df)
    for j in i
        DuckDB.append(appender, j)
    end
    DuckDB.end_row(appender)
end
# close the appender after all rows
DuckDB.close(appender)

Concurrency

Within a Julia process, tasks are able to concurrently read and write to the database, as long as each task maintains its own connection to the database. In the example below, a single task is spawned to periodically read the database and many tasks are spawned to write to the database using both [INSERT statements]({% link docs/sql/statements/insert.md %}) as well as the [Appender API]({% link docs/data/appender.md %}).

using Dates, DataFrames, DuckDB
db = DuckDB.DB()
DBInterface.connect(db)
DBInterface.execute(db, "CREATE OR REPLACE TABLE data (date TIMESTAMP, id INTEGER)")

function run_reader(db)
    # create a DuckDB connection specifically for this task
    conn = DBInterface.connect(db)
    while true
        println(DBInterface.execute(conn,
                "SELECT id, count(date) AS count, max(date) AS max_date
                FROM data GROUP BY id ORDER BY id") |> DataFrames.DataFrame)
        Threads.sleep(1)
    end
    DBInterface.close(conn)
end
# spawn one reader task
Threads.@spawn run_reader(db)

function run_inserter(db, id)
    # create a DuckDB connection specifically for this task
    conn = DBInterface.connect(db)
    for i in 1:1000
        Threads.sleep(0.01)
        DuckDB.execute(conn, "INSERT INTO data VALUES (current_timestamp, ?)"; id);
    end
    DBInterface.close(conn)
end
# spawn many insert tasks
for i in 1:100
    Threads.@spawn run_inserter(db, 1)
end

function run_appender(db, id)
    # create a DuckDB connection specifically for this task
    appender = DuckDB.Appender(db, "data")
    for i in 1:1000
        Threads.sleep(0.01)
        row = (Dates.now(Dates.UTC), id)
        for j in row
            DuckDB.append(appender, j);
        end
        DuckDB.end_row(appender);
    end
    DuckDB.close(appender);
end
# spawn many appender tasks
for i in 1:100
    Threads.@spawn run_appender(db, 2)
end

Original Julia Connector

Credits to kimmolinna for the original DuckDB Julia connector.

layout: docu title: Safe Mode

The DuckDB CLI client supports “safe mode”. In safe mode, the CLI is prevented from accessing external files other than the database file that it was initially connected to and prevented from interacting with the host file system.

This has the following effects:

  • The following [dot commands]({% link docs/clients/cli/dot_commands.md %}) are disabled:
    • .cd
    • .excel
    • .import
    • .log
    • .once
    • .open
    • .output
    • .read
    • .sh
    • .system
  • Auto-complete no longer scans the file system for files to suggest as auto-complete targets.
  • The [getenv function]({% link docs/clients/cli/overview.md %}#reading-environment-variables) is disabled.
  • The [enable_external_access option]({% link docs/configuration/overview.md %}#configuration-reference) is set to false. This implies that:
    • ATTACH cannot attach a database from an on-disk file.
    • COPY cannot read to/write from files.
    • read_csv, read_parquet, read_json, etc. cannot read from disk.

Once safe mode is activated, it cannot be deactivated in the same DuckDB CLI session.

layout: docu title: CLI API redirect_from:

  • /docs/api/cli
  • /docs/api/cli/
  • /docs/clients/cli
  • /docs/clients/cli/
  • /docs/api/cli/overview
  • /docs/api/cli/overview/

Installation

The DuckDB CLI (Command Line Interface) is a single, dependency-free executable. It is precompiled for Windows, Mac, and Linux for both the stable version and for nightly builds produced by GitHub Actions. Please see the [installation page]({% link docs/installation/index.html %}) under the CLI tab for download links.

The DuckDB CLI is based on the SQLite command line shell, so CLI-client-specific functionality is similar to what is described in the SQLite documentation (although DuckDB's SQL syntax follows PostgreSQL conventions with a [few exceptions]({% link docs/sql/dialect/postgresql_compatibility.md %})).

DuckDB has a tldr page, which summarizes the most common uses of the CLI client. If you have tldr installed, you can display it by running tldr duckdb.

Getting Started

Once the CLI executable has been downloaded, unzip it and save it to any directory. Navigate to that directory in a terminal and enter the command duckdb to run the executable. If in a PowerShell or POSIX shell environment, use the command ./duckdb instead.

Usage

The typical usage of the duckdb command is the following:

duckdb [OPTIONS] [FILENAME]

Options

The [OPTIONS] part encodes [arguments for the CLI client]({% link docs/clients/cli/arguments.md %}). Common options include:

  • -csv: sets the output mode to CSV
  • -json: sets the output mode to JSON
  • -readonly: open the database in read-only mode (see [concurrency in DuckDB]({% link docs/connect/concurrency.md %}#handling-concurrency))

For a full list of options, see the [command line arguments page]({% link docs/clients/cli/arguments.md %}).

In-Memory vs. Persistent Database

When no [FILENAME] argument is provided, the DuckDB CLI will open a temporary [in-memory database]({% link docs/connect/overview.md %}#in-memory-database). You will see DuckDB's version number, the information on the connection and a prompt starting with a D.

duckdb
v{{ site.currentduckdbversion }} {{ site.currentduckdbhash }}
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D

To open or create a [persistent database]({% link docs/connect/overview.md %}#persistent-database), simply include a path as a command line argument:

duckdb my_database.duckdb

Running SQL Statements in the CLI

Once the CLI has been opened, enter a SQL statement followed by a semicolon, then hit enter and it will be executed. Results will be displayed in a table in the terminal. If a semicolon is omitted, hitting enter will allow for multi-line SQL statements to be entered.

SELECT 'quack' AS my_column;
my_column
quack

The CLI supports all of DuckDB's rich [SQL syntax]({% link docs/sql/introduction.md %}) including SELECT, CREATE, and ALTER statements.

Editor Features

The CLI supports [autocompletion]({% link docs/clients/cli/autocomplete.md %}), and has sophisticated [editor features]({% link docs/clients/cli/editing.md %}) and [syntax highlighting]({% link docs/clients/cli/syntax_highlighting.md %}) on certain platforms.

Exiting the CLI

To exit the CLI, press Ctrl+D if your platform supports it. Otherwise, press Ctrl+C or use the .exit command. If used a persistent database, DuckDB will automatically checkpoint (save the latest edits to disk) and close. This will remove the .wal file (the write-ahead log) and consolidate all of your data into the single-file database.

Dot Commands

In addition to SQL syntax, special [dot commands]({% link docs/clients/cli/dot_commands.md %}) may be entered into the CLI client. To use one of these commands, begin the line with a period (.) immediately followed by the name of the command you wish to execute. Additional arguments to the command are entered, space separated, after the command. If an argument must contain a space, either single or double quotes may be used to wrap that parameter. Dot commands must be entered on a single line, and no whitespace may occur before the period. No semicolon is required at the end of the line.

Frequently-used configurations can be stored in the file ~/.duckdbrc, which will be loaded when starting the CLI client. See the Configuring the CLI section below for further information on these options.

Tip To prevent the DuckDB CLI client from reading the ~/.duckdbrc file, start it as follows:

duckdb -init /dev/null

Below, we summarize a few important dot commands. To see all available commands, see the [dot commands page]({% link docs/clients/cli/dot_commands.md %}) or use the .help command.

Opening Database Files

In addition to connecting to a database when opening the CLI, a new database connection can be made by using the .open command. If no additional parameters are supplied, a new in-memory database connection is created. This database will not be persisted when the CLI connection is closed.

.open

The .open command optionally accepts several options, but the final parameter can be used to indicate a path to a persistent database (or where one should be created). The special string :memory: can also be used to open a temporary in-memory database.

.open persistent.duckdb

Warning .open closes the current database. To keep the current database, while adding a new database, use the [ATTACH statement]({% link docs/sql/statements/attach.md %}).

One important option accepted by .open is the --readonly flag. This disallows any editing of the database. To open in read only mode, the database must already exist. This also means that a new in-memory database can't be opened in read only mode since in-memory databases are created upon connection.

.open --readonly preexisting.duckdb

Output Formats

The .mode [dot command]({% link docs/clients/cli/dot_commands.md %}#mode) may be used to change the appearance of the tables returned in the terminal output. These include the default duckbox mode, csv and json mode for ingestion by other tools, markdown and latex for documents, and insert mode for generating SQL statements.

Writing Results to a File

By default, the DuckDB CLI sends results to the terminal's standard output. However, this can be modified using either the .output or .once commands. For details, see the documentation for the [output dot command]({% link docs/clients/cli/dot_commands.md %}#output-writing-results-to-a-file).

Reading SQL from a File

The DuckDB CLI can read both SQL commands and dot commands from an external file instead of the terminal using the .read command. This allows for a number of commands to be run in sequence and allows command sequences to be saved and reused.

The .read command requires only one argument: the path to the file containing the SQL and/or commands to execute. After running the commands in the file, control will revert back to the terminal. Output from the execution of that file is governed by the same .output and .once commands that have been discussed previously. This allows the output to be displayed back to the terminal, as in the first example below, or out to another file, as in the second example.

In this example, the file select_example.sql is located in the same directory as duckdb.exe and contains the following SQL statement:

SELECT *
FROM generate_series(5);

To execute it from the CLI, the .read command is used.

.read select_example.sql

The output below is returned to the terminal by default. The formatting of the table can be adjusted using the .output or .once commands.

| generate_series |
|----------------:|
| 0               |
| 1               |
| 2               |
| 3               |
| 4               |
| 5               |

Multiple commands, including both SQL and dot commands, can also be run in a single .read command. In this example, the file write_markdown_to_file.sql is located in the same directory as duckdb.exe and contains the following commands:

.mode markdown
.output series.md
SELECT *
FROM generate_series(5);

To execute it from the CLI, the .read command is used as before.

.read write_markdown_to_file.sql

In this case, no output is returned to the terminal. Instead, the file series.md is created (or replaced if it already existed) with the markdown-formatted results shown here:

| generate_series |
|----------------:|
| 0               |
| 1               |
| 2               |
| 3               |
| 4               |
| 5               |

Configuring the CLI

Several dot commands can be used to configure the CLI. On startup, the CLI reads and executes all commands in the file ~/.duckdbrc, including dot commands and SQL statements. This allows you to store the configuration state of the CLI. You may also point to a different initialization file using the -init.

Setting a Custom Prompt

As an example, a file in the same directory as the DuckDB CLI named prompt.sql will change the DuckDB prompt to be a duck head and run a SQL statement. Note that the duck head is built with Unicode characters and does not work in all terminal environments (e.g., in Windows, unless running with WSL and using the Windows Terminal).

.prompt '⚫◗ '

To invoke that file on initialization, use this command:

duckdb -init prompt.sql

This outputs:

-- Loading resources from prompt.sql
v⟨version⟩ ⟨git hash⟩
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
⚫◗

Non-Interactive Usage

To read/process a file and exit immediately, pipe the file contents in to duckdb:

duckdb < select_example.sql

To execute a command with SQL text passed in directly from the command line, call duckdb with two arguments: the database location (or :memory:), and a string with the SQL statement to execute.

duckdb :memory: "SELECT 42 AS the_answer"

Loading Extensions

To load extensions, use DuckDB's SQL INSTALL and LOAD commands as you would other SQL statements.

INSTALL fts;
LOAD fts;

For details, see the [Extension docs]({% link docs/extensions/overview.md %}).

Reading from stdin and Writing to stdout

When in a Unix environment, it can be useful to pipe data between multiple commands. DuckDB is able to read data from stdin as well as write to stdout using the file location of stdin (/dev/stdin) and stdout (/dev/stdout) within SQL commands, as pipes act very similarly to file handles.

This command will create an example CSV:

COPY (SELECT 42 AS woot UNION ALL SELECT 43 AS woot) TO 'test.csv' (HEADER);

First, read a file and pipe it to the duckdb CLI executable. As arguments to the DuckDB CLI, pass in the location of the database to open, in this case, an in-memory database, and a SQL command that utilizes /dev/stdin as a file location.

cat test.csv | duckdb -c "SELECT * FROM read_csv('/dev/stdin')"
woot
42
43

To write back to stdout, the copy command can be used with the /dev/stdout file location.

cat test.csv | \
    duckdb -c "COPY (SELECT * FROM read_csv('/dev/stdin')) TO '/dev/stdout' WITH (FORMAT 'csv', HEADER)"
woot
42
43

Reading Environment Variables

The getenv function can read environment variables.

Examples

To retrieve the home directory's path from the HOME environment variable, use:

SELECT getenv('HOME') AS home;
home
/Users/user_name

The output of the getenv function can be used to set [configuration options]({% link docs/configuration/overview.md %}). For example, to set the NULL order based on the environment variable DEFAULT_NULL_ORDER, use:

SET default_null_order = getenv('DEFAULT_NULL_ORDER');

Restrictions for Reading Environment Variables

The getenv function can only be run when the [enable_external_access]({% link docs/configuration/overview.md %}#configuration-reference) is set to true (the default setting). It is only available in the CLI client and is not supported in other DuckDB clients.

Prepared Statements

The DuckDB CLI supports executing [prepared statements]({% link docs/sql/query_syntax/prepared_statements.md %}) in addition to regular SELECT statements. To create and execute a prepared statement in the CLI client, use the PREPARE clause and the EXECUTE statement.

layout: docu title: Autocomplete redirect_from:

  • /docs/api/cli/autocomplete
  • /docs/api/cli/autocomplete/

The shell offers context-aware autocomplete of SQL queries through the [autocomplete extension]({% link docs/extensions/autocomplete.md %}). autocomplete is triggered by pressing Tab.

Multiple autocomplete suggestions can be present. You can cycle forwards through the suggestions by repeatedly pressing Tab, or Shift+Tab to cycle backwards. autocompletion can be reverted by pressing ESC twice.

The shell autocompletes four different groups:

  • Keywords
  • Table names and table functions
  • Column names and scalar functions
  • File names

The shell looks at the position in the SQL statement to determine which of these autocompletions to trigger. For example:

SELECT s
student_id
SELECT student_id F
FROM
SELECT student_id FROM g
grades
SELECT student_id FROM 'd
'data/
SELECT student_id FROM 'data/
'data/grades.csv

layout: docu title: Syntax Highlighting redirect_from:

  • /docs/api/cli/syntax_highlighting
  • /docs/api/cli/syntax_highlighting/

Syntax highlighting in the CLI is currently only available for macOS and Linux.

SQL queries that are written in the shell are automatically highlighted using syntax highlighting.

Image showing syntax highlighting in the shell

There are several components of a query that are highlighted in different colors. The colors can be configured using [dot commands]({% link docs/clients/cli/dot_commands.md %}). Syntax highlighting can also be disabled entirely using the .highlight off command.

Below is a list of components that can be configured.

Type Command Default color
Keywords .keyword green
Constants ad literals .constant yellow
Comments .comment brightblack
Errors .error red
Continuation .cont brightblack
Continuation (Selected) .cont_sel green

The components can be configured using either a supported color name (e.g., .keyword red), or by directly providing a terminal code to use for rendering (e.g., .keywordcode \033[31m). Below is a list of supported color names and their corresponding terminal codes.

Color Terminal code
red \033[31m
green \033[32m
yellow \033[33m
blue \033[34m
magenta \033[35m
cyan \033[36m
white \033[37m
brightblack \033[90m
brightred \033[91m
brightgreen \033[92m
brightyellow \033[93m
brightblue \033[94m
brightmagenta \033[95m
brightcyan \033[96m
brightwhite \033[97m

For example, here is an alternative set of syntax highlighting colors:

.keyword brightred
.constant brightwhite
.comment cyan
.error yellow
.cont blue
.cont_sel brightblue

If you wish to start up the CLI with a different set of colors every time, you can place these commands in the ~/.duckdbrc file that is loaded on start-up of the CLI.

Error Highlighting

The shell has support for highlighting certain errors. In particular, mismatched brackets and unclosed quotes are highlighted in red (or another color if specified). This highlighting is automatically disabled for large queries. In addition, it can be disabled manually using the .render_errors off command.

layout: docu title: Editing redirect_from:

  • /docs/api/cli/editing
  • /docs/api/cli/editing/

The linenoise-based CLI editor is currently only available for macOS and Linux.

DuckDB's CLI uses a line-editing library based on linenoise, which has shortcuts that are based on Emacs mode of readline. Below is a list of available commands.

Moving

Key Action
Left Move back a character
Right Move forward a character
Up Move up a line. When on the first line, move to previous history entry
Down Move down a line. When on last line, move to next history entry
Home Move to beginning of buffer
End Move to end of buffer
Ctrl+Left Move back a word
Ctrl+Right Move forward a word
Ctrl+A Move to beginning of buffer
Ctrl+B Move back a character
Ctrl+E Move to end of buffer
Ctrl+F Move forward a character
Alt+Left Move back a word
Alt+Right Move forward a word

History

Key Action
Ctrl+P Move to previous history entry
Ctrl+N Move to next history entry
Ctrl+R Search the history
Ctrl+S Search the history
Alt+< Move to first history entry
Alt+> Move to last history entry
Alt+N Search the history
Alt+P Search the history

Changing Text

Key Action
Backspace Delete previous character
Delete Delete next character
Ctrl+D Delete next character. When buffer is empty, end editing
Ctrl+H Delete previous character
Ctrl+K Delete everything after the cursor
Ctrl+T Swap current and next character
Ctrl+U Delete all text
Ctrl+W Delete previous word
Alt+C Convert next word to titlecase
Alt+D Delete next word
Alt+L Convert next word to lowercase
Alt+R Delete all text
Alt+T Swap current and next word
Alt+U Convert next word to uppercase
Alt+Backspace Delete previous word
Alt+\ Delete spaces around cursor

Completing

Key Action
Tab Autocomplete. When autocompleting, cycle to next entry
Shift+Tab When autocompleting, cycle to previous entry
Esc+Esc When autocompleting, revert autocompletion

Miscellaneous

Key Action
Enter Execute query. If query is not complete, insert a newline at the end of the buffer
Ctrl+J Execute query. If query is not complete, insert a newline at the end of the buffer
Ctrl+C Cancel editing of current query
Ctrl+G Cancel editing of current query
Ctrl+L Clear screen
Ctrl+O Cancel editing of current query
Ctrl+X Insert a newline after the cursor
Ctrl+Z Suspend CLI and return to shell, use fg to re-open

Using Read-Line

If you prefer, you can use rlwrap to use read-line directly with the shell. Then, use Shift+Enter to insert a newline and Enter to execute the query:

rlwrap --substitute-prompt="D " duckdb -batch

layout: docu title: Output Formats redirect_from:

  • /docs/api/cli/output-formats
  • /docs/api/cli/output-formats/
  • /docs/api/cli/output_formats
  • /docs/api/cli/output_formats/

The .mode [dot command]({% link docs/clients/cli/dot_commands.md %}) may be used to change the appearance of the tables returned in the terminal output. In addition to customizing the appearance, these modes have additional benefits. This can be useful for presenting DuckDB output elsewhere by redirecting the terminal [output to a file]({% link docs/clients/cli/dot_commands.md %}#output-writing-results-to-a-file). Using the insert mode will build a series of SQL statements that can be used to insert the data at a later point. The markdown mode is particularly useful for building documentation and the latex mode is useful for writing academic papers.

Mode Description
ascii Columns/rows delimited by 0x1F and 0x1E
box Tables using unicode box-drawing characters
csv Comma-separated values
column Output in columns (See .width)
duckbox Tables with extensive features (default)
html HTML <table> code
insert SQL insert statements for TABLE
json Results in a JSON array
jsonlines Results in a NDJSON
latex LaTeX tabular environment code
line One value per line
list Values delimited by "|"
markdown Markdown table format
quote Escape answers as for SQL
table ASCII-art table
tabs Tab-separated values
tcl TCL list elements
trash No output

Use .mode directly to query the appearance currently in use.

.mode
current output mode: duckbox
.mode markdown
SELECT 'quacking intensifies' AS incoming_ducks;
|    incoming_ducks    |
|----------------------|
| quacking intensifies |

The output appearance can also be adjusted with the .separator command. If using an export mode that relies on a separator (csv or tabs for example), the separator will be reset when the mode is changed. For example, .mode csv will set the separator to a comma (,). Using .separator "|" will then convert the output to be pipe-separated.

.mode csv
SELECT 1 AS col_1, 2 AS col_2
UNION ALL
SELECT 10 AS col1, 20 AS col_2;
col_1,col_2
1,2
10,20
.separator "|"
SELECT 1 AS col_1, 2 AS col_2
UNION ALL
SELECT 10 AS col1, 20 AS col_2;
col_1|col_2
1|2
10|20

layout: docu title: Command Line Arguments redirect_from:

  • /docs/cli/arguments
  • /docs/cli/arguments/

The table below summarizes DuckDB's command line options. To list all command line options, use the command:

duckdb -help

For a list of dot commands available in the CLI shell, see the [Dot Commands page]({% link docs/clients/cli/dot_commands.md %}).

Argument Description
-append Append the database to the end of the file
-ascii Set [output mode]({% link docs/clients/cli/output_formats.md %}) to ascii
-bail Stop after hitting an error
-batch Force batch I/O
-box Set [output mode]({% link docs/clients/cli/output_formats.md %}) to box
-column Set [output mode]({% link docs/clients/cli/output_formats.md %}) to column
-cmd COMMAND Run COMMAND before reading stdin
-c COMMAND Run COMMAND and exit
-csv Set [output mode]({% link docs/clients/cli/output_formats.md %}) to csv
-echo Print commands before execution
-f FILENAME Run the script in FILENAME and exit. Note that the ~/.duckdbrc is read and executed first (if it exists)
-init FILENAME Run the script in FILENAME upon startup (instead of ~/.duckdbrc)
-header Turn headers on
-help Show this message
-html Set [output mode]({% link docs/clients/cli/output_formats.md %}) to HTML
-interactive Force interactive I/O
-json Set [output mode]({% link docs/clients/cli/output_formats.md %}) to json
-line Set [output mode]({% link docs/clients/cli/output_formats.md %}) to line
-list Set [output mode]({% link docs/clients/cli/output_formats.md %}) to list
-markdown Set [output mode]({% link docs/clients/cli/output_formats.md %}) to markdown
-newline SEP Set output row separator. Default: \n
-nofollow Refuse to open symbolic links to database files
-noheader Turn headers off
-no-stdin Exit after processing options instead of reading stdin
-nullvalue TEXT Set text string for NULL values. Default: empty string
-quote Set [output mode]({% link docs/clients/cli/output_formats.md %}) to quote
-readonly Open the database read-only
-s COMMAND Run COMMAND and exit
-separator SEP Set output column separator to SEP. Default: `
-stats Print memory stats before each finalize
-table Set [output mode]({% link docs/clients/cli/output_formats.md %}) to table
-unsigned Allow loading of [unsigned extensions]({% link docs/extensions/overview.md %}#unsigned-extensions). This option is intended to be used for developing extensions. Consult the [Securing DuckDB page]({% link docs/operations_manual/securing_duckdb/securing_extensions.md %}) for guidelines on how set up DuckDB in a secure manner
-version Show DuckDB version

Passing a Sequence of Arguments

Note that the CLI arguments are processed in order, similarly to the behavior of the SQLite CLI. For example:

duckdb -csv -c 'SELECT 42 AS hello' -json -c 'SELECT 84 AS world'

Returns the following:

hello
42
[{"world":84}]

layout: docu title: Dot Commands redirect_from:

  • /docs/api/cli/dot-commands
  • /docs/api/cli/dot-commands/
  • /docs/api/cli/dot_commands
  • /docs/api/cli/dot_commands/

Dot commands are available in the DuckDB CLI client. To use one of these commands, begin the line with a period (.) immediately followed by the name of the command you wish to execute. Additional arguments to the command are entered, space separated, after the command. If an argument must contain a space, either single or double quotes may be used to wrap that parameter. Dot commands must be entered on a single line, and no whitespace may occur before the period. No semicolon is required at the end of the line. To see available commands, use the .help command.

Dot Commands

Command Description
`.bail on off`
`.binary on off`
.cd DIRECTORY Change the working directory to DIRECTORY
`.changes on off`
.check GLOB Fail if output since .testcase does not match
.columns Column-wise rendering of query results
.constant ?COLOR? Sets the syntax highlighting color used for constant values
.constantcode ?CODE? Sets the syntax highlighting terminal code used for constant values
.databases List names and files of attached databases
`.echo on off`
.excel Display the output of next command in spreadsheet
.exit ?CODE? Exit this program with return-code CODE
`.explain ?on off
.fullschema ?--indent? Show schema and the content of sqlite_stat tables
`.headers on off`
.help ?-all? ?PATTERN? Show help text for PATTERN
`.highlight [on off]`
.import FILE TABLE Import data from FILE into TABLE
.indexes ?TABLE? Show names of indexes
.keyword ?COLOR? Sets the syntax highlighting color used for keywords
.keywordcode ?CODE? Sets the syntax highlighting terminal code used for keywords
`.large_number_rendering all footer
.lint OPTIONS Report potential schema issues
`.log FILE off`
.maxrows COUNT Sets the maximum number of rows for display. Only for [duckbox mode]({% link docs/clients/cli/output_formats.md %})
.maxwidth COUNT Sets the maximum width in characters. 0 defaults to terminal width. Only for [duckbox mode]({% link docs/clients/cli/output_formats.md %})
.mode MODE ?TABLE? Set [output mode]({% link docs/clients/cli/output_formats.md %})
.multiline Set multi-line mode (default)
.nullvalue STRING Use STRING in place of NULL values
.once ?OPTIONS? ?FILE? Output for the next SQL command only to FILE
.open ?OPTIONS? ?FILE? Close existing database and reopen FILE
.output ?FILE? Send output to FILE or stdout if FILE is omitted
.parameter CMD ... Manage SQL parameter bindings
.print STRING... Print literal STRING
.prompt MAIN CONTINUE Replace the standard prompts
.quit Exit this program
.read FILE Read input from FILE
.rows Row-wise rendering of query results (default)
.safe_mode Activates [safe mode]({% link docs/clients/cli/safe_mode.md %})
.schema ?PATTERN? Show the CREATE statements matching PATTERN
.separator COL ?ROW? Change the column and row separators
.shell CMD ARGS... Run CMD ARGS... in a system shell
.show Show the current values for various settings
.singleline Set single-line mode
.system CMD ARGS... Run CMD ARGS... in a system shell
.tables ?TABLE? List names of tables [matching LIKE pattern]({% link docs/sql/functions/pattern_matching.md %}) TABLE
.testcase NAME Begin redirecting output to NAME
`.timer on off`
.width NUM1 NUM2 ... Set minimum column widths for columnar output

Using the .help Command

The .help text may be filtered by passing in a text string as the second argument.

.help m
.maxrows COUNT      Sets the maximum number of rows for display (default: 40). Only for duckbox mode.
.maxwidth COUNT     Sets the maximum width in characters. 0 defaults to terminal width. Only for duckbox mode.
.mode MODE ?TABLE?  Set output mode

.output: Writing Results to a File

By default, the DuckDB CLI sends results to the terminal's standard output. However, this can be modified using either the .output or .once commands. Pass in the desired output file location as a parameter. The .once command will only output the next set of results and then revert to standard out, but .output will redirect all subsequent output to that file location. Note that each result will overwrite the entire file at that destination. To revert back to standard output, enter .output with no file parameter.

In this example, the output format is changed to markdown, the destination is identified as a Markdown file, and then DuckDB will write the output of the SQL statement to that file. Output is then reverted to standard output using .output with no parameter.

.mode markdown
.output my_results.md
SELECT 'taking flight' AS output_column;
.output
SELECT 'back to the terminal' AS displayed_column;

The file my_results.md will then contain:

| output_column |
|---------------|
| taking flight |

The terminal will then display:

|   displayed_column   |
|----------------------|
| back to the terminal |

A common output format is CSV, or comma separated values. DuckDB supports [SQL syntax to export data as CSV or Parquet]({% link docs/sql/statements/copy.md %}#copy-to), but the CLI-specific commands may be used to write a CSV instead if desired.

.mode csv
.once my_output_file.csv
SELECT 1 AS col_1, 2 AS col_2
UNION ALL
SELECT 10 AS col1, 20 AS col_2;

The file my_output_file.csv will then contain:

col_1,col_2
1,2
10,20

By passing special options (flags) to the .once command, query results can also be sent to a temporary file and automatically opened in the user's default program. Use either the -e flag for a text file (opened in the default text editor), or the -x flag for a CSV file (opened in the default spreadsheet editor). This is useful for more detailed inspection of query results, especially if there is a relatively large result set. The .excel command is equivalent to .once -x.

.once -e
SELECT 'quack' AS hello;

The results then open in the default text file editor of the system, for example:

cli_docs_output_to_text_editor

Tip macOS users can copy the results to their clipboards using pbcopy by using .once to output to pbcopy via a pipe: .once |pbcopy

Combining this with the .headers off and .mode lines options can be particularly effective.

Querying the Database Schema

All DuckDB clients support [querying the database schema with SQL]({% link docs/sql/meta/information_schema.md %}), but the CLI has additional [dot commands]({% link docs/clients/cli/dot_commands.md %}) that can make it easier to understand the contents of a database. The .tables command will return a list of tables in the database. It has an optional argument that will filter the results according to a [LIKE pattern]({% link docs/sql/functions/pattern_matching.md %}#like).

CREATE TABLE swimmers AS SELECT 'duck' AS animal;
CREATE TABLE fliers AS SELECT 'duck' AS animal;
CREATE TABLE walkers AS SELECT 'duck' AS animal;
.tables
fliers    swimmers  walkers

For example, to filter to only tables that contain an l, use the LIKE pattern %l%.

.tables %l%
fliers   walkers

The .schema command will show all of the SQL statements used to define the schema of the database.

.schema
CREATE TABLE fliers (animal VARCHAR);
CREATE TABLE swimmers (animal VARCHAR);
CREATE TABLE walkers (animal VARCHAR);

Configuring the Syntax Highlighter

By default the shell includes support for syntax highlighting. The CLI's syntax highlighter can be configured using the following commands.

To turn off the highlighter:

.highlight off

To turn on the highlighter:

.highlight on

To configure the color used to highlight constants:

.constant [red|green|yellow|blue|magenta|cyan|white|brightblack|brightred|brightgreen|brightyellow|brightblue|brightmagenta|brightcyan|brightwhite]
.constantcode [terminal_code]

To configure the color used to highlight keywords:

.keyword [red|green|yellow|blue|magenta|cyan|white|brightblack|brightred|brightgreen|brightyellow|brightblue|brightmagenta|brightcyan|brightwhite]
.keywordcode [terminal_code]

Shorthands

DuckDB's CLI allows using shorthands for dot commands. Once a sequence of characters can unambiguously completed to a dot command or an argument, the CLI (silently) autocompletes them. For example:

.mo ma

Is equivalent to:

.mode markdown

Tip Avoid using shorthands in SQL scripts to improve readability and ensure that the scripts and futureproof.

Importing Data from CSV

Deprecated This feature is only included for compatibility reasons and may be removed in the future. Use the [read_csv function or the COPY statement]({% link docs/data/csv/overview.md %}) to load CSV files.

DuckDB supports [SQL syntax to directly query or import CSV files]({% link docs/data/csv/overview.md %}), but the CLI-specific commands may be used to import a CSV instead if desired. The .import command takes two arguments and also supports several options. The first argument is the path to the CSV file, and the second is the name of the DuckDB table to create. Since DuckDB requires stricter typing than SQLite (upon which the DuckDB CLI is based), the destination table must be created before using the .import command. To automatically detect the schema and create a table from a CSV, see the [read_csv examples in the import docs]({% link docs/data/csv/overview.md %}).

In this example, a CSV file is generated by changing to CSV mode and setting an output file location:

.mode csv
.output import_example.csv
SELECT 1 AS col_1, 2 AS col_2 UNION ALL SELECT 10 AS col1, 20 AS col_2;

Now that the CSV has been written, a table can be created with the desired schema and the CSV can be imported. The output is reset to the terminal to avoid continuing to edit the output file specified above. The --skip N option is used to ignore the first row of data since it is a header row and the table has already been created with the correct column names.

.mode csv
.output
CREATE TABLE test_table (col_1 INTEGER, col_2 INTEGER);
.import import_example.csv test_table --skip 1

Note that the .import command utilizes the current .mode and .separator settings when identifying the structure of the data to import. The --csv option can be used to override that behavior.

.import import_example.csv test_table --skip 1 --csv

layout: docu title: R Client github_repository: https://github.com/duckdb/duckdb-r redirect_from:

  • /docs/api/r
  • /docs/api/r/

Installation

duckdb: R Client

The DuckDB R client can be installed using the following command:

install.packages("duckdb")

Please see the [installation page]({% link docs/installation/index.html %}?environment=r) for details.

duckplyr: dplyr Client

DuckDB offers a dplyr-compatible API via the duckplyr package. It can be installed using install.packages("duckplyr"). For details, see the duckplyr documentation.

Reference Manual

The reference manual for the DuckDB R client is available at r.duckdb.org.

Basic Client Usage

The standard DuckDB R client implements the DBI interface for R. If you are not familiar with DBI yet, see the Using DBI page for an introduction.

Startup & Shutdown

To use DuckDB, you must first create a connection object that represents the database. The connection object takes as parameter the database file to read and write from. If the database file does not exist, it will be created (the file extension may be .db, .duckdb, or anything else). The special value :memory: (the default) can be used to create an in-memory database. Note that for an in-memory database no data is persisted to disk (i.e., all data is lost when you exit the R process). If you would like to connect to an existing database in read-only mode, set the read_only flag to TRUE. Read-only mode is required if multiple R processes want to access the same database file at the same time.

library("duckdb")
# to start an in-memory database
con <- dbConnect(duckdb())
# or
con <- dbConnect(duckdb(), dbdir = ":memory:")
# to use a database file (not shared between processes)
con <- dbConnect(duckdb(), dbdir = "my-db.duckdb", read_only = FALSE)
# to use a database file (shared between processes)
con <- dbConnect(duckdb(), dbdir = "my-db.duckdb", read_only = TRUE)

Connections are closed implicitly when they go out of scope or if they are explicitly closed using dbDisconnect(). To shut down the database instance associated with the connection, use dbDisconnect(con, shutdown = TRUE)

Querying

DuckDB supports the standard DBI methods to send queries and retrieve result sets. dbExecute() is meant for queries where no results are expected like CREATE TABLE or UPDATE etc. and dbGetQuery() is meant to be used for queries that produce results (e.g., SELECT). Below an example.

# create a table
dbExecute(con, "CREATE TABLE items (item VARCHAR, value DECIMAL(10, 2), count INTEGER)")
# insert two items into the table
dbExecute(con, "INSERT INTO items VALUES ('jeans', 20.0, 1), ('hammer', 42.2, 2)")

# retrieve the items again
res <- dbGetQuery(con, "SELECT * FROM items")
print(res)
#     item value count
# 1  jeans  20.0     1
# 2 hammer  42.2     2

DuckDB also supports prepared statements in the R client with the dbExecute and dbGetQuery methods. Here is an example:

# prepared statement parameters are given as a list
dbExecute(con, "INSERT INTO items VALUES (?, ?, ?)", list('laptop', 2000, 1))

# if you want to reuse a prepared statement multiple times, use dbSendStatement() and dbBind()
stmt <- dbSendStatement(con, "INSERT INTO items VALUES (?, ?, ?)")
dbBind(stmt, list('iphone', 300, 2))
dbBind(stmt, list('android', 3.5, 1))
dbClearResult(stmt)

# query the database using a prepared statement
res <- dbGetQuery(con, "SELECT item FROM items WHERE value > ?", list(400))
print(res)
#       item
# 1 laptop

Warning Do not use prepared statements to insert large amounts of data into DuckDB. See below for better options.

Efficient Transfer

To write a R data frame into DuckDB, use the standard DBI function dbWriteTable(). This creates a table in DuckDB and populates it with the data frame contents. For example:

dbWriteTable(con, "iris_table", iris)
res <- dbGetQuery(con, "SELECT * FROM iris_table LIMIT 1")
print(res)
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2  setosa

It is also possible to “register” a R data frame as a virtual table, comparable to a SQL VIEW. This does not actually transfer data into DuckDB yet. Below is an example:

duckdb_register(con, "iris_view", iris)
res <- dbGetQuery(con, "SELECT * FROM iris_view LIMIT 1")
print(res)
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2  setosa

DuckDB keeps a reference to the R data frame after registration. This prevents the data frame from being garbage-collected. The reference is cleared when the connection is closed, but can also be cleared manually using the duckdb_unregister() method.

Also refer to the [data import documentation]({% link docs/data/overview.md %}) for more options of efficiently importing data.

dbplyr

DuckDB also plays well with the dbplyr / dplyr packages for programmatic query construction from R. Here is an example:

library("duckdb")
library("dplyr")
con <- dbConnect(duckdb())
duckdb_register(con, "flights", nycflights13::flights)

tbl(con, "flights") |>
  group_by(dest) |>
  summarise(delay = mean(dep_time, na.rm = TRUE)) |>
  collect()

When using dbplyr, CSV and Parquet files can be read using the dplyr::tbl function.

# Establish a CSV for the sake of this example
write.csv(mtcars, "mtcars.csv")

# Summarize the dataset in DuckDB to avoid reading the entire CSV into R's memory
tbl(con, "mtcars.csv") |>
  group_by(cyl) |>
  summarise(across(disp:wt, .fns = mean)) |>
  collect()
# Establish a set of Parquet files
dbExecute(con, "COPY flights TO 'dataset' (FORMAT PARQUET, PARTITION_BY (year, month))")

# Summarize the dataset in DuckDB to avoid reading 12 Parquet files into R's memory
tbl(con, "read_parquet('dataset/**/*.parquet', hive_partitioning = true)") |>
  filter(month == "3") |>
  summarise(delay = mean(dep_time, na.rm = TRUE)) |>
  collect()

Memory Limit

You can use the [memory_limit configuration option]({% link docs/configuration/pragmas.md %}) to limit the memory use of DuckDB, e.g.:

SET memory_limit = '2GB';

Note that this limit is only applied to the memory DuckDB uses and it does not affect the memory use of other R libraries. Therefore, the total memory used by the R process may be higher than the configured memory_limit.

Troubleshooting

Warning When Installing on macOS

On macOS, installing DuckDB may result in a warning unable to load shared object '.../R_X11.so':

Warning message:
In doTryCatch(return(expr), name, parentenv, handler) :
  unable to load shared object '/Library/Frameworks/R.framework/Resources/modules//R_X11.so':
  dlopen(/Library/Frameworks/R.framework/Resources/modules//R_X11.so, 0x0006): Library not loaded: /opt/X11/lib/libSM.6.dylib
  Referenced from: <31EADEB5-0A17-3546-9944-9B3747071FE8> /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/modules/R_X11.so
  Reason: tried: '/opt/X11/lib/libSM.6.dylib' (no such file) ...
> ')

Note that this is just a warning, so the simplest solution is to ignore it. Alternatively, you can install DuckDB from the R-universe:

install.packages("duckdb", repos = c("https://duckdb.r-universe.dev", "https://cloud.r-project.org"))

You may also install the optional xquartz dependency via Homebrew.

layout: docu title: Java JDBC Client github_repository: https://github.com/duckdb/duckdb-java redirect_from:

  • /docs/api/java
  • /docs/api/java/
  • /docs/api/scala
  • /docs/api/scala/

Installation

The DuckDB Java JDBC API can be installed from Maven Central. Please see the [installation page]({% link docs/installation/index.html %}?environment=java) for details.

Basic API Usage

DuckDB's JDBC API implements the main parts of the standard Java Database Connectivity (JDBC) API, version 4.1. Describing JDBC is beyond the scope of this page, see the official documentation for details. Below we focus on the DuckDB-specific parts.

Refer to the externally hosted API Reference for more information about our extensions to the JDBC specification, or the below Arrow Methods.

Startup & Shutdown

In JDBC, database connections are created through the standard java.sql.DriverManager class. The driver should auto-register in the DriverManager, if that does not work for some reason, you can enforce registration using the following statement:

Class.forName("org.duckdb.DuckDBDriver");

To create a DuckDB connection, call DriverManager with the jdbc:duckdb: JDBC URL prefix, like so:

import java.sql.Connection;
import java.sql.DriverManager;

Connection conn = DriverManager.getConnection("jdbc:duckdb:");

To use DuckDB-specific features such as the Appender, cast the object to a DuckDBConnection:

import java.sql.DriverManager;
import org.duckdb.DuckDBConnection;

DuckDBConnection conn = (DuckDBConnection) DriverManager.getConnection("jdbc:duckdb:");

When using the jdbc:duckdb: URL alone, an in-memory database is created. Note that for an in-memory database no data is persisted to disk (i.e., all data is lost when you exit the Java program). If you would like to access or create a persistent database, append its file name after the path. For example, if your database is stored in /tmp/my_database, use the JDBC URL jdbc:duckdb:/tmp/my_database to create a connection to it.

It is possible to open a DuckDB database file in read-only mode. This is for example useful if multiple Java processes want to read the same database file at the same time. To open an existing database file in read-only mode, set the connection property duckdb.read_only like so:

Properties readOnlyProperty = new Properties();
readOnlyProperty.setProperty("duckdb.read_only", "true");
Connection conn = DriverManager.getConnection("jdbc:duckdb:/tmp/my_database", readOnlyProperty);

Additional connections can be created using the DriverManager. A more efficient mechanism is to call the DuckDBConnection#duplicate() method:

Connection conn2 = ((DuckDBConnection) conn).duplicate();

Multiple connections are allowed, but mixing read-write and read-only connections is unsupported.

Configuring Connections

Configuration options can be provided to change different settings of the database system. Note that many of these settings can be changed later on using [PRAGMA statements]({% link docs/configuration/pragmas.md %}) as well.

Properties connectionProperties = new Properties();
connectionProperties.setProperty("temp_directory", "/path/to/temp/dir/");
Connection conn = DriverManager.getConnection("jdbc:duckdb:/tmp/my_database", connectionProperties);

Querying

DuckDB supports the standard JDBC methods to send queries and retrieve result sets. First a Statement object has to be created from the Connection, this object can then be used to send queries using execute and executeQuery. execute() is meant for queries where no results are expected like CREATE TABLE or UPDATE etc. and executeQuery() is meant to be used for queries that produce results (e.g., SELECT). Below two examples. See also the JDBC Statement and ResultSet documentations.

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;

Connection conn = DriverManager.getConnection("jdbc:duckdb:");

// create a table
Statement stmt = conn.createStatement();
stmt.execute("CREATE TABLE items (item VARCHAR, value DECIMAL(10, 2), count INTEGER)");
// insert two items into the table
stmt.execute("INSERT INTO items VALUES ('jeans', 20.0, 1), ('hammer', 42.2, 2)");

try (ResultSet rs = stmt.executeQuery("SELECT * FROM items")) {
    while (rs.next()) {
        System.out.println(rs.getString(1));
        System.out.println(rs.getInt(3));
    }
}
stmt.close();
jeans
1
hammer
2

DuckDB also supports prepared statements as per the JDBC API:

import java.sql.PreparedStatement;

try (PreparedStatement stmt = conn.prepareStatement("INSERT INTO items VALUES (?, ?, ?);")) {
    stmt.setString(1, "chainsaw");
    stmt.setDouble(2, 500.0);
    stmt.setInt(3, 42);
    stmt.execute();
    // more calls to execute() possible
}

Warning Do not use prepared statements to insert large amounts of data into DuckDB. See the [data import documentation]({% link docs/data/overview.md %}) for better options.

Arrow Methods

Refer to the API Reference for type signatures

Arrow Export

The following demonstrates exporting an arrow stream and consuming it using the java arrow bindings

import org.apache.arrow.memory.RootAllocator;
import org.apache.arrow.vector.ipc.ArrowReader;
import org.duckdb.DuckDBResultSet;

try (var conn = DriverManager.getConnection("jdbc:duckdb:");
    var stmt = conn.prepareStatement("SELECT * FROM generate_series(2000)");
    var resultset = (DuckDBResultSet) stmt.executeQuery();
    var allocator = new RootAllocator()) {
    try (var reader = (ArrowReader) resultset.arrowExportStream(allocator, 256)) {
        while (reader.loadNextBatch()) {
            System.out.println(reader.getVectorSchemaRoot().getVector("generate_series"));
        }
    }
    stmt.close();
}

Arrow Import

The following demonstrates consuming an Arrow stream from the Java Arrow bindings.

import org.apache.arrow.memory.RootAllocator;
import org.apache.arrow.vector.ipc.ArrowReader;
import org.duckdb.DuckDBConnection;

// Arrow binding
try (var allocator = new RootAllocator();
     ArrowStreamReader reader = null; // should not be null of course
     var arrow_array_stream = ArrowArrayStream.allocateNew(allocator)) {
    Data.exportArrayStream(allocator, reader, arrow_array_stream);

    // DuckDB setup
    try (var conn = (DuckDBConnection) DriverManager.getConnection("jdbc:duckdb:")) {
        conn.registerArrowStream("asdf", arrow_array_stream);

        // run a query
        try (var stmt = conn.createStatement();
             var rs = (DuckDBResultSet) stmt.executeQuery("SELECT count(*) FROM asdf")) {
            while (rs.next()) {
                System.out.println(rs.getInt(1));
            }
        }
    }
}

Streaming Results

Result streaming is opt-in in the JDBC driver – by setting the jdbc_stream_results config to true before running a query. The easiest way do that is to pass it in the Properties object.

Properties props = new Properties();
props.setProperty(DuckDBDriver.JDBC_STREAM_RESULTS, String.valueOf(true));

Connection conn = DriverManager.getConnection("jdbc:duckdb:", props);

Appender

The [Appender]({% link docs/data/appender.md %}) is available in the DuckDB JDBC driver via the org.duckdb.DuckDBAppender class. The constructor of the class requires the schema name and the table name it is applied to. The Appender is flushed when the close() method is called.

Example:

import java.sql.DriverManager;
import java.sql.Statement;
import org.duckdb.DuckDBConnection;

DuckDBConnection conn = (DuckDBConnection) DriverManager.getConnection("jdbc:duckdb:");
try (var stmt = conn.createStatement()) {
    stmt.execute("CREATE TABLE tbl (x BIGINT, y FLOAT, s VARCHAR)"
);

// using try-with-resources to automatically close the appender at the end of the scope
try (var appender = conn.createAppender(DuckDBConnection.DEFAULT_SCHEMA, "tbl")) {
    appender.beginRow();
    appender.append(10);
    appender.append(3.2);
    appender.append("hello");
    appender.endRow();
    appender.beginRow();
    appender.append(20);
    appender.append(-8.1);
    appender.append("world");
    appender.endRow();
}

Batch Writer

The DuckDB JDBC driver offers batch write functionality. The batch writer supports prepared statements to mitigate the overhead of query parsing.

The preferred method for bulk inserts is to use the Appender due to its higher performance. However, when using the Appender is not possbile, the batch writer is available as alternative.

Batch Writer with Prepared Statements

import java.sql.DriverManager;
import java.sql.PreparedStatement;
import org.duckdb.DuckDBConnection;

DuckDBConnection conn = (DuckDBConnection) DriverManager.getConnection("jdbc:duckdb:");
PreparedStatement stmt = conn.prepareStatement("INSERT INTO test (x, y, z) VALUES (?, ?, ?);");

stmt.setObject(1, 1);
stmt.setObject(2, 2);
stmt.setObject(3, 3);
stmt.addBatch();

stmt.setObject(1, 4);
stmt.setObject(2, 5);
stmt.setObject(3, 6);
stmt.addBatch();

stmt.executeBatch();
stmt.close();

Batch Writer with Vanilla Statements

The batch writer also supports vanilla SQL statements:

import java.sql.DriverManager;
import java.sql.Statement;
import org.duckdb.DuckDBConnection;

DuckDBConnection conn = (DuckDBConnection) DriverManager.getConnection("jdbc:duckdb:");
Statement stmt = conn.createStatement();

stmt.execute("CREATE TABLE test (x INTEGER, y INTEGER, z INTEGER)");

stmt.addBatch("INSERT INTO test (x, y, z) VALUES (1, 2, 3);");
stmt.addBatch("INSERT INTO test (x, y, z) VALUES (4, 5, 6);");

stmt.executeBatch();
stmt.close();

Troubleshooting

Driver Class Not Found

If the Java application is unable to find the DuckDB, it may throw the following error:

Exception in thread "main" java.sql.SQLException: No suitable driver found for jdbc:duckdb:
    at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:706)
    at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:252)
    ...

And when trying to load the class manually, it may result in this error:

Exception in thread "main" java.lang.ClassNotFoundException: org.duckdb.DuckDBDriver
    at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
    at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520)
    at java.base/java.lang.Class.forName0(Native Method)
    at java.base/java.lang.Class.forName(Class.java:375)
    ...

These errors stem from the DuckDB Maven/Gradle dependency not being detected. To ensure that it is detected, force refresh the Maven configuration in your IDE.

layout: docu title: ODBC API Overview github_repository: https://github.com/duckdb/duckdb-odbc redirect_from:

  • /docs/api/odbc
  • /docs/api/odbc/
  • /docs/api/odbc/overview
  • /docs/api/odbc/overview/

The ODBC (Open Database Connectivity) is a C-style API that provides access to different flavors of Database Management Systems (DBMSs). The ODBC API consists of the Driver Manager (DM) and the ODBC drivers.

The Driver Manager is part of the system library, e.g., unixODBC, which manages the communications between the user applications and the ODBC drivers. Typically, applications are linked against the DM, which uses Data Source Name (DSN) to look up the correct ODBC driver.

The ODBC driver is a DBMS implementation of the ODBC API, which handles all the internals of that DBMS.

The DM maps user application calls of ODBC functions to the correct ODBC driver that performs the specified function and returns the proper values.

DuckDB ODBC Driver

DuckDB supports the ODBC version 3.0 according to the Core Interface Conformance.

The ODBC driver is available for all operating systems. Visit the [installation page]({% link docs/installation/index.html %}) for direct links.

layout: docu title: ODBC API on Linux github_repository: https://github.com/duckdb/duckdb-odbc redirect_from:

  • /docs/api/odbc/linux
  • /docs/api/odbc/linux/

Driver Manager

A driver manager is required to manage communication between applications and the ODBC driver. We tested and support unixODBC that is a complete ODBC driver manager for Linux. Users can install it from the command line:

On Debian-based distributions (Ubuntu, Mint, etc.), run:

sudo apt-get install unixodbc odbcinst

On Fedora-based distributions (Amazon Linux, RHEL, CentOS, etc.), run:

sudo yum install unixODBC

Setting Up the Driver

  1. Download the ODBC Linux Asset corresponding to your architecture:

  2. The package contains the following files:

    • libduckdb_odbc.so: the DuckDB driver.
    • unixodbc_setup.sh: a setup script to aid the configuration on Linux.

    To extract them, run:

    mkdir duckdb_odbc && unzip duckdb_odbc-linux-amd64.zip -d duckdb_odbc
  3. The unixodbc_setup.sh script performs the configuration of the DuckDB ODBC Driver. It is based on the unixODBC package that provides some commands to handle the ODBC setup and test like odbcinst and isql.

    Run the following commands with either option -u or -s to configure DuckDB ODBC.

    The -u option based on the user home directory to setup the ODBC init files.

    ./unixodbc_setup.sh -u

    The -s option changes the system level files that will be visible for all users, because of that it requires root privileges.

    sudo ./unixodbc_setup.sh -s

    The option --help shows the usage of unixodbc_setup.sh prints the help.

    ./unixodbc_setup.sh --help
    Usage: ./unixodbc_setup.sh <level> [options]
    
    Example: ./unixodbc_setup.sh -u -db ~/database_path -D ~/driver_path/libduckdb_odbc.so
    
    Level:
    -s: System-level, using 'sudo' to configure DuckDB ODBC at the system-level, changing the files: /etc/odbc[inst].ini
    -u: User-level, configuring the DuckDB ODBC at the user-level, changing the files: ~/.odbc[inst].ini.
    
    Options:
    -db database_path>: the DuckDB database file path, the default is ':memory:' if not provided.
    -D driver_path: the driver file path (i.e., the path for libduckdb_odbc.so), the default is using the base script directory
    
  4. The ODBC setup on Linux is based on the .odbc.ini and .odbcinst.ini files.

    These files can be placed to the user home directory /home/⟨username⟩ or in the system /etc directory. The Driver Manager prioritizes the user configuration files over the system files.

    For the details of the configuration parameters, see the [ODBC configuration page]({% link docs/clients/odbc/configuration.md %}).


layout: docu title: ODBC API on macOS github_repository: https://github.com/duckdb/duckdb-odbc redirect_from:

  • /docs/api/odbc/macos
  • /docs/api/odbc/macos/

  1. A driver manager is required to manage communication between applications and the ODBC driver. DuckDB supports unixODBC, which is a complete ODBC driver manager for macOS and Linux. Users can install it from the command line via Homebrew:

    brew install unixodbc
  2. DuckDB releases a universal [ODBC driver for macOS](https://github.com/duckdb/duckdb/releases/download/v{{ site.currentduckdbodbcversion }}/duckdb_odbc-osx-universal.zip) (supporting both Intel and Apple Silicon CPUs). To download it, run:
    wget https://github.com/duckdb/duckdb/releases/download/v{{ site.currentduckdbodbcversion }}/duckdb_odbc-osx-universal.zip
  3. The archive contains the libduckdb_odbc.dylib artifact. To extract it to a directory, run:

    mkdir duckdb_odbc && unzip duckdb_odbc-osx-universal.zip -d duckdb_odbc
  4. There are two ways to configure the ODBC driver, either by initializing via the configuration files, or by connecting with SQLDriverConnect. A combination of the two is also possible.

    Furthermore, the ODBC driver supports all the [configuration options]({% link docs/configuration/overview.md %}) included in DuckDB.

    If a configuration is set in both the connection string passed to SQLDriverConnect and in the odbc.ini file, the one passed to SQLDriverConnect will take precedence.

    For the details of the configuration parameters, see the [ODBC configuration page]({% link docs/clients/odbc/configuration.md %}).

  5. After the configuration, to validate the installation, it is possible to use an ODBC client. unixODBC uses a command line tool called isql.

    Use the DSN defined in odbc.ini as a parameter of isql.

    isql DuckDB
    +---------------------------------------+
    | Connected!                            |
    |                                       |
    | sql-statement                         |
    | help [tablename]                      |
    | echo [string]                         |
    | quit                                  |
    |                                       |
    +---------------------------------------+
    
    SQL> SELECT 42;
    +------------+
    | 42         |
    +------------+
    | 42         |
    +------------+
    
    SQLRowCount returns -1
    1 rows fetched
    

layout: docu title: ODBC API on Windows github_repository: https://github.com/duckdb/duckdb-odbc redirect_from:

  • /docs/api/odbc/windows
  • /docs/api/odbc/windows/

Using the DuckDB ODBC API on Windows requires the following steps:

  1. The Microsoft Windows requires an ODBC Driver Manager to manage communication between applications and the ODBC drivers. The Driver Manager on Windows is provided in a DLL file odbccp32.dll, and other files and tools. For detailed information check out the Common ODBC Component Files.

  2. DuckDB releases the ODBC driver as an asset. For Windows, download it from the [Windows ODBC asset (x86_64/AMD64)](https://github.com/duckdb/duckdb/releases/download/v{{ site.currentduckdbodbcversion }}/duckdb_odbc-windows-amd64.zip).
  3. The archive contains the following artifacts:

    • duckdb_odbc.dll: the DuckDB driver compiled for Windows.
    • duckdb_odbc_setup.dll: a setup DLL used by the Windows ODBC Data Source Administrator tool.
    • odbc_install.exe: an installation script to aid the configuration on Windows.

    Decompress the archive to a directory (e.g., duckdb_odbc). For example, run:

    mkdir duckdb_odbc && unzip duckdb_odbc-windows-amd64.zip -d duckdb_odbc
  4. The odbc_install.exe binary performs the configuration of the DuckDB ODBC Driver on Windows. It depends on the Odbccp32.dll that provides functions to configure the ODBC registry entries.

    Inside the permanent directory (e.g., duckdb_odbc), double-click on the odbc_install.exe.

    Windows administrator privileges are required. In case of a non-administrator, a User Account Control prompt will occur.

  5. odbc_install.exe adds a default DSN configuration into the ODBC registries with a default database :memory:.

DSN Windows Setup

After the installation, it is possible to change the default DSN configuration or add a new one using the Windows ODBC Data Source Administrator tool odbcad32.exe.

It also can be launched thought the Windows start:

Default DuckDB DSN

The newly installed DSN is visible on the System DSN in the Windows ODBC Data Source Administrator tool:

Windows ODBC Config Tool

Changing DuckDB DSN

When selecting the default DSN (i.e., DuckDB) or adding a new configuration, the following setup window will display:

DuckDB Windows DSN Setup

This window allows you to set the DSN and the database file path associated with that DSN.

More Detailed Windows Setup

There are two ways to configure the ODBC driver, either by altering the registry keys as detailed below, or by connecting with SQLDriverConnect. A combination of the two is also possible.

Furthermore, the ODBC driver supports all the [configuration options]({% link docs/configuration/overview.md %}) included in DuckDB.

If a configuration is set in both the connection string passed to SQLDriverConnect and in the odbc.ini file, the one passed to SQLDriverConnect will take precedence.

For the details of the configuration parameters, see the [ODBC configuration page]({% link docs/clients/odbc/configuration.md %}).

Registry Keys

The ODBC setup on Windows is based on registry keys (see Registry Entries for ODBC Components). The ODBC entries can be placed at the current user registry key (HKCU) or the system registry key (HKLM).

We have tested and used the system entries based on HKLM->SOFTWARE->ODBC. The odbc_install.exe changes this entry that has two subkeys: ODBC.INI and ODBCINST.INI.

The ODBC.INI is where users usually insert DSN registry entries for the drivers.

For example, the DSN registry for DuckDB would look like this:

HKLM->SOFTWARE->ODBC->ODBC.INI->DuckDB

The ODBCINST.INI contains one entry for each ODBC driver and other keys predefined for Windows ODBC configuration.

Updating the ODBC Driver

When a new version of the ODBC driver is released, installing the new version will overwrite the existing one. However, the installer doesn't always update the version number in the registry. To ensure the correct version is used, check that HKEY_LOCAL_MACHINE\SOFTWARE\ODBC\ODBCINST.INI\DuckDB Driver has the most recent version, and HKEY_LOCAL_MACHINE\SOFTWARE\ODBC\ODBC.INI\DuckDB\Driver has the correct path to the new driver.

layout: docu title: ODBC Configuration github_repository: https://github.com/duckdb/duckdb-odbc redirect_from:

  • /docs/api/odbc/configuration
  • /docs/api/odbc/configuration/

This page documents the files using the ODBC configuration, odbc.ini and odbcinst.ini. These are either placed in the home directory as dotfiles (.odbc.ini and .odbcinst.ini, respectively) or in a system directory. For platform-specific details, see the pages for [Linux]({% link docs/clients/odbc/linux.md %}), [macOS]({% link docs/clients/odbc/macos.md %}), and [Windows]({% link docs/clients/odbc/windows.md %}).

odbc.ini and .odbc.ini

The odbc.ini file contains the DSNs for the drivers, which can have specific knobs. An example of odbc.ini with DuckDB:

[DuckDB]
Driver = DuckDB Driver
Database = :memory:
access_mode = read_only

The lines correspond to the following parameters:

  • [DuckDB]: between the brackets is a DSN for the DuckDB.
  • Driver: Describes the driver's name, as well as where to find the configurations in the odbcinst.ini.
  • Database: Describes the database name used by DuckDB, can also be a file path to a .db in the system.
  • access_mode: The mode in which to connect to the database.

odbcinst.ini and .odbcinst.ini

The odbcinst.ini file contains general configurations for the ODBC installed drivers in the system. A driver section starts with the driver name between brackets, and then it follows specific configuration knobs belonging to that driver.

Example of odbcinst.ini with the DuckDB:

[ODBC]
Trace = yes
TraceFile = /tmp/odbctrace

[DuckDB Driver]
Driver = /path/to/libduckdb_odbc.dylib

The lines correspond to the following parameters:

  • [ODBC]: The DM configuration section.
  • Trace: Enables the ODBC trace file using the option yes.
  • TraceFile: The absolute system file path for the ODBC trace file.
  • [DuckDB Driver]: The section of the DuckDB installed driver.
  • Driver: The absolute system file path of the DuckDB driver. Change to match your configuration.

layout: docu title: Dart Client github_repository: https://github.com/TigerEyeLabs/duckdb-dart redirect_from:

  • /docs/api/dart
  • /docs/api/dart/

DuckDB.Dart is the native Dart API for DuckDB.

Installation

DuckDB.Dart can be installed from pub.dev. Please see the API Reference for details.

Use This Package as a Library

Depend on It

Add the dependency with Flutter:

flutter pub add dart_duckdb

This will add a line like this to your package's pubspec.yaml (and run an implicit flutter pub get):

dependencies:
  dart_duckdb: ^1.1.3

Alternatively, your editor might support flutter pub get. Check the docs for your editor to learn more.

Import It

Now in your Dart code, you can import it:

import 'package:dart_duckdb/dart_duckdb.dart';

Usage Examples

See the example projects in the duckdb-dart repository:

  • cli: command-line application
  • duckdbexplorer: GUI application which builds for desktop operating systems as well as Android and iOS.

Here are some common code snippets for DuckDB.Dart:

Querying an In-Memory Database

import 'package:dart_duckdb/dart_duckdb.dart';

void main() {
  final db = duckdb.open(":memory:");
  final connection = duckdb.connect(db);

  connection.execute('''
    CREATE TABLE users (id INTEGER, name VARCHAR, age INTEGER);
    INSERT INTO users VALUES (1, 'Alice', 30), (2, 'Bob', 25);
  ''');

  final result = connection.query("SELECT * FROM users WHERE age > 28").fetchAll();

  for (final row in result) {
    print(row);
  }

  connection.dispose();
  db.dispose();
}

Queries on Background Isolates

import 'package:dart_duckdb/dart_duckdb.dart';

void main() {
  final db = duckdb.open(":memory:");
  final connection = duckdb.connect(db);

  await Isolate.spawn(backgroundTask, db.transferrable);

  connection.dispose();
  db.dispose();
}

void backgroundTask(TransferableDatabase transferableDb) {
  final connection = duckdb.connectWithTransferred(transferableDb);
  // Access database ...
  // fetch is needed to send the data back to the main isolate
}

layout: docu title: C++ API redirect_from:

  • /docs/api/cpp
  • /docs/api/cpp/

Warning DuckDB's C++ API is internal. It is not guaranteed to be stable and can change without notice. If you would like to build an application on DuckDB, we recommend using the [C API]({% link docs/clients/c/overview.md %}).

Installation

The DuckDB C++ API can be installed as part of the libduckdb packages. Please see the [installation page]({% link docs/installation/index.html %}?environment=cplusplus) for details.

Basic API Usage

DuckDB implements a custom C++ API. This is built around the abstractions of a database instance (DuckDB class), multiple Connections to the database instance and QueryResult instances as the result of queries. The header file for the C++ API is duckdb.hpp.

Startup & Shutdown

To use DuckDB, you must first initialize a DuckDB instance using its constructor. DuckDB() takes as parameter the database file to read and write from. The special value nullptr can be used to create an in-memory database. Note that for an in-memory database no data is persisted to disk (i.e., all data is lost when you exit the process). The second parameter to the DuckDB constructor is an optional DBConfig object. In DBConfig, you can set various database parameters, for example the read/write mode or memory limits. The DuckDB constructor may throw exceptions, for example if the database file is not usable.

With the DuckDB instance, you can create one or many Connection instances using the Connection() constructor. While connections should be thread-safe, they will be locked during querying. It is therefore recommended that each thread uses its own connection if you are in a multithreaded environment.

DuckDB db(nullptr);
Connection con(db);

Querying

Connections expose the Query() method to send a SQL query string to DuckDB from C++. Query() fully materializes the query result as a MaterializedQueryResult in memory before returning at which point the query result can be consumed. There is also a streaming API for queries, see further below.

// create a table
con.Query("CREATE TABLE integers (i INTEGER, j INTEGER)");

// insert three rows into the table
con.Query("INSERT INTO integers VALUES (3, 4), (5, 6), (7, NULL)");

auto result = con.Query("SELECT * FROM integers");
if (result->HasError()) {
    cerr << result->GetError() << endl;
} else {
    cout << result->ToString() << endl;
}

The MaterializedQueryResult instance contains firstly two fields that indicate whether the query was successful. Query will not throw exceptions under normal circumstances. Instead, invalid queries or other issues will lead to the success Boolean field in the query result instance to be set to false. In this case an error message may be available in error as a string. If successful, other fields are set: the type of statement that was just executed (e.g., StatementType::INSERT_STATEMENT) is contained in statement_type. The high-level (“Logical type”/“SQL type”) types of the result set columns are in types. The names of the result columns are in the names string vector. In case multiple result sets are returned, for example because the result set contained multiple statements, the result set can be chained using the next field.

DuckDB also supports prepared statements in the C++ API with the Prepare() method. This returns an instance of PreparedStatement. This instance can be used to execute the prepared statement with parameters. Below is an example:

std::unique_ptr<PreparedStatement> prepare = con.Prepare("SELECT count(*) FROM a WHERE i = $1");
std::unique_ptr<QueryResult> result = prepare->Execute(12);

Warning Do not use prepared statements to insert large amounts of data into DuckDB. See the [data import documentation]({% link docs/data/overview.md %}) for better options.

UDF API

The UDF API allows the definition of user-defined functions. It is exposed in duckdb:Connection through the methods: CreateScalarFunction(), CreateVectorizedFunction(), and variants. These methods created UDFs into the temporary schema (TEMP_SCHEMA) of the owner connection that is the only one allowed to use and change them.

CreateScalarFunction

The user can code an ordinary scalar function and invoke the CreateScalarFunction() to register and afterward use the UDF in a SELECT statement, for instance:

bool bigger_than_four(int value) {
    return value > 4;
}

connection.CreateScalarFunction<bool, int>("bigger_than_four", &bigger_than_four);

connection.Query("SELECT bigger_than_four(i) FROM (VALUES(3), (5)) tbl(i)")->Print();

The CreateScalarFunction() methods automatically creates vectorized scalar UDFs so they are as efficient as built-in functions, we have two variants of this method interface as follows:

1.

template<typename TR, typename... Args>
void CreateScalarFunction(string name, TR (*udf_func)(Args…))
  • template parameters:
    • TR is the return type of the UDF function;
    • Args are the arguments up to 3 for the UDF function (this method only supports until ternary functions);
  • name: is the name to register the UDF function;
  • udf_func: is a pointer to the UDF function.

This method automatically discovers from the template typenames the corresponding LogicalTypes:

  • boolLogicalType::BOOLEAN
  • int8_tLogicalType::TINYINT
  • int16_tLogicalType::SMALLINT
  • int32_tLogicalType::INTEGER
  • int64_t LogicalType::BIGINT
  • floatLogicalType::FLOAT
  • doubleLogicalType::DOUBLE
  • string_tLogicalType::VARCHAR

In DuckDB some primitive types, e.g., int32_t, are mapped to the same LogicalType: INTEGER, TIME and DATE, then for disambiguation the users can use the following overloaded method.

2.

template<typename TR, typename... Args>
void CreateScalarFunction(string name, vector<LogicalType> args, LogicalType ret_type, TR (*udf_func)(Args…))

An example of use would be:

int32_t udf_date(int32_t a) {
    return a;
}

con.Query("CREATE TABLE dates (d DATE)");
con.Query("INSERT INTO dates VALUES ('1992-01-01')");

con.CreateScalarFunction<int32_t, int32_t>("udf_date", {LogicalType::DATE}, LogicalType::DATE, &udf_date);

con.Query("SELECT udf_date(d) FROM dates")->Print();
  • template parameters:
    • TR is the return type of the UDF function;
    • Args are the arguments up to 3 for the UDF function (this method only supports until ternary functions);
  • name: is the name to register the UDF function;
  • args: are the LogicalType arguments that the function uses, which should match with the template Args types;
  • ret_type: is the LogicalType of return of the function, which should match with the template TR type;
  • udf_func: is a pointer to the UDF function.

This function checks the template types against the LogicalTypes passed as arguments and they must match as follow:

  • LogicalTypeId::BOOLEAN → bool
  • LogicalTypeId::TINYINT → int8_t
  • LogicalTypeId::SMALLINT → int16_t
  • LogicalTypeId::DATE, LogicalTypeId::TIME, LogicalTypeId::INTEGER → int32_t
  • LogicalTypeId::BIGINT, LogicalTypeId::TIMESTAMP → int64_t
  • LogicalTypeId::FLOAT, LogicalTypeId::DOUBLE, LogicalTypeId::DECIMAL → double
  • LogicalTypeId::VARCHAR, LogicalTypeId::CHAR, LogicalTypeId::BLOB → string_t
  • LogicalTypeId::VARBINARY → blob_t

CreateVectorizedFunction

The CreateVectorizedFunction() methods register a vectorized UDF such as:

/*
* This vectorized function copies the input values to the result vector
*/
template<typename TYPE>
static void udf_vectorized(DataChunk &args, ExpressionState &state, Vector &result) {
    // set the result vector type
    result.vector_type = VectorType::FLAT_VECTOR;
    // get a raw array from the result
    auto result_data = FlatVector::GetData<TYPE>(result);

    // get the solely input vector
    auto &input = args.data[0];
    // now get an orrified vector
    VectorData vdata;
    input.Orrify(args.size(), vdata);

    // get a raw array from the orrified input
    auto input_data = (TYPE *)vdata.data;

    // handling the data
    for (idx_t i = 0; i < args.size(); i++) {
        auto idx = vdata.sel->get_index(i);
        if ((*vdata.nullmask)[idx]) {
            continue;
        }
        result_data[i] = input_data[idx];
    }
}

con.Query("CREATE TABLE integers (i INTEGER)");
con.Query("INSERT INTO integers VALUES (1), (2), (3), (999)");

con.CreateVectorizedFunction<int, int>("udf_vectorized_int", &&udf_vectorized<int>);

con.Query("SELECT udf_vectorized_int(i) FROM integers")->Print();

The Vectorized UDF is a pointer of the type scalar_function_t:

typedef std::function<void(DataChunk &args, ExpressionState &expr, Vector &result)> scalar_function_t;
  • args is a DataChunk that holds a set of input vectors for the UDF that all have the same length;
  • expr is an ExpressionState that provides information to the query's expression state;
  • result: is a Vector to store the result values.

There are different vector types to handle in a Vectorized UDF:

  • ConstantVector;
  • DictionaryVector;
  • FlatVector;
  • ListVector;
  • StringVector;
  • StructVector;
  • SequenceVector.

The general API of the CreateVectorizedFunction() method is as follows:

1.

template<typename TR, typename... Args>
void CreateVectorizedFunction(string name, scalar_function_t udf_func, LogicalType varargs = LogicalType::INVALID)
  • template parameters:
    • TR is the return type of the UDF function;
    • Args are the arguments up to 3 for the UDF function.
  • name is the name to register the UDF function;
  • udf_func is a vectorized UDF function;
  • varargs The type of varargs to support, or LogicalTypeId::INVALID (default value) if the function does not accept variable length arguments.

This method automatically discovers from the template typenames the corresponding LogicalTypes:

  • bool → LogicalType::BOOLEAN;
  • int8_t → LogicalType::TINYINT;
  • int16_t → LogicalType::SMALLINT
  • int32_t → LogicalType::INTEGER
  • int64_t → LogicalType::BIGINT
  • float → LogicalType::FLOAT
  • double → LogicalType::DOUBLE
  • string_t → LogicalType::VARCHAR

2.

template<typename TR, typename... Args>
void CreateVectorizedFunction(string name, vector<LogicalType> args, LogicalType ret_type, scalar_function_t udf_func, LogicalType varargs = LogicalType::INVALID)

layout: docu title: Overview redirect_from:

  • /docs/api/c
  • /docs/api/c/
  • /docs/api/c/overview
  • /docs/api/c/overview/

DuckDB implements a custom C API modelled somewhat following the SQLite C API. The API is contained in the duckdb.h header. Continue to [Startup & Shutdown]({% link docs/clients/c/connect.md %}) to get started, or check out the [Full API overview]({% link docs/clients/c/api.md %}).

We also provide a SQLite API wrapper which means that if your applications is programmed against the SQLite C API, you can re-link to DuckDB and it should continue working. See the sqlite_api_wrapper folder in our source repository for more information.

Installation

The DuckDB C API can be installed as part of the libduckdb packages. Please see the installation page for details.

layout: docu title: Configuration redirect_from:

  • /docs/api/c/config
  • /docs/api/c/config/

Configuration options can be provided to change different settings of the database system. Note that many of these settings can be changed later on using PRAGMA statements as well. The configuration object should be created, filled with values and passed to duckdb_open_ext.

Example

duckdb_database db;
duckdb_config config;

// create the configuration object
if (duckdb_create_config(&config) == DuckDBError) {
    // handle error
}
// set some configuration options
duckdb_set_config(config, "access_mode", "READ_WRITE"); // or READ_ONLY
duckdb_set_config(config, "threads", "8");
duckdb_set_config(config, "max_memory", "8GB");
duckdb_set_config(config, "default_order", "DESC");

// open the database using the configuration
if (duckdb_open_ext(NULL, &db, config, NULL) == DuckDBError) {
    // handle error
}
// cleanup the configuration object
duckdb_destroy_config(&config);

// run queries...

// cleanup
duckdb_close(&db);

API Reference Overview

duckdb_state duckdb_create_config(duckdb_config *out_config);
size_t duckdb_config_count();
duckdb_state duckdb_get_config_flag(size_t index, const char **out_name, const char **out_description);
duckdb_state duckdb_set_config(duckdb_config config, const char *name, const char *option);
void duckdb_destroy_config(duckdb_config *config);

duckdb_create_config

Initializes an empty configuration object that can be used to provide start-up options for the DuckDB instance through duckdb_open_ext. The duckdb_config must be destroyed using 'duckdb_destroy_config'

This will always succeed unless there is a malloc failure.

Note that duckdb_destroy_config should always be called on the resulting config, even if the function returns DuckDBError.

Syntax
duckdb_state duckdb_create_config(
  duckdb_config *out_config
);
Parameters
  • out_config: The result configuration object.
Return Value

DuckDBSuccess on success or DuckDBError on failure.


duckdb_config_count

This returns the total amount of configuration options available for usage with duckdb_get_config_flag.

This should not be called in a loop as it internally loops over all the options.

Return Value

The amount of config options available.

Syntax
size_t duckdb_config_count(
  
);

duckdb_get_config_flag

Obtains a human-readable name and description of a specific configuration option. This can be used to e.g. display configuration options. This will succeed unless index is out of range (i.e., >= duckdb_config_count).

The result name or description MUST NOT be freed.

Syntax
duckdb_state duckdb_get_config_flag(
  size_t index,
  const char **out_name,
  const char **out_description
);
Parameters
  • index: The index of the configuration option (between 0 and duckdb_config_count)
  • out_name: A name of the configuration flag.
  • out_description: A description of the configuration flag.
Return Value

DuckDBSuccess on success or DuckDBError on failure.


duckdb_set_config

Sets the specified option for the specified configuration. The configuration option is indicated by name. To obtain a list of config options, see duckdb_get_config_flag.

In the source code, configuration options are defined in config.cpp.

This can fail if either the name is invalid, or if the value provided for the option is invalid.

Syntax
duckdb_state duckdb_set_config(
  duckdb_config config,
  const char *name,
  const char *option
);
Parameters
  • config: The configuration object to set the option on.
  • name: The name of the configuration flag to set.
  • option: The value to set the configuration flag to.
Return Value

DuckDBSuccess on success or DuckDBError on failure.


duckdb_destroy_config

Destroys the specified configuration object and de-allocates all memory allocated for the object.

Syntax
void duckdb_destroy_config(
  duckdb_config *config
);
Parameters
  • config: The configuration object to destroy.

--- layout: docu title: Table Functions redirect_from: - /docs/api/c/table_functions - /docs/api/c/table_functions/ ---

The table function API can be used to define a table function that can then be called from within DuckDB in the FROM clause of a query.

API Reference Overview

duckdb_table_function duckdb_create_table_function();
void duckdb_destroy_table_function(duckdb_table_function *table_function);
void duckdb_table_function_set_name(duckdb_table_function table_function, const char *name);
void duckdb_table_function_add_parameter(duckdb_table_function table_function, duckdb_logical_type type);
void duckdb_table_function_add_named_parameter(duckdb_table_function table_function, const char *name, duckdb_logical_type type);
void duckdb_table_function_set_extra_info(duckdb_table_function table_function, void *extra_info, duckdb_delete_callback_t destroy);
void duckdb_table_function_set_bind(duckdb_table_function table_function, duckdb_table_function_bind_t bind);
void duckdb_table_function_set_init(duckdb_table_function table_function, duckdb_table_function_init_t init);
void duckdb_table_function_set_local_init(duckdb_table_function table_function, duckdb_table_function_init_t init);
void duckdb_table_function_set_function(duckdb_table_function table_function, duckdb_table_function_t function);
void duckdb_table_function_supports_projection_pushdown(duckdb_table_function table_function, bool pushdown);
duckdb_state duckdb_register_table_function(duckdb_connection con, duckdb_table_function function);

Table Function Bind

void *duckdb_bind_get_extra_info(duckdb_bind_info info);
void duckdb_bind_add_result_column(duckdb_bind_info info, const char *name, duckdb_logical_type type);
idx_t duckdb_bind_get_parameter_count(duckdb_bind_info info);
duckdb_value duckdb_bind_get_parameter(duckdb_bind_info info, idx_t index);
duckdb_value duckdb_bind_get_named_parameter(duckdb_bind_info info, const char *name);
void duckdb_bind_set_bind_data(duckdb_bind_info info, void *bind_data, duckdb_delete_callback_t destroy);
void duckdb_bind_set_cardinality(duckdb_bind_info info, idx_t cardinality, bool is_exact);
void duckdb_bind_set_error(duckdb_bind_info info, const char *error);

Table Function Init

void *duckdb_init_get_extra_info(duckdb_init_info info);
void *duckdb_init_get_bind_data(duckdb_init_info info);
void duckdb_init_set_init_data(duckdb_init_info info, void *init_data, duckdb_delete_callback_t destroy);
idx_t duckdb_init_get_column_count(duckdb_init_info info);
idx_t duckdb_init_get_column_index(duckdb_init_info info, idx_t column_index);
void duckdb_init_set_max_threads(duckdb_init_info info, idx_t max_threads);
void duckdb_init_set_error(duckdb_init_info info, const char *error);

Table Function

void *duckdb_function_get_extra_info(duckdb_function_info info);
void *duckdb_function_get_bind_data(duckdb_function_info info);
void *duckdb_function_get_init_data(duckdb_function_info info);
void *duckdb_function_get_local_init_data(duckdb_function_info info);
void duckdb_function_set_error(duckdb_function_info info, const char *error);

duckdb_create_table_function

Creates a new empty table function.

The return value should be destroyed with duckdb_destroy_table_function.

Return Value

The table function object.

Syntax
duckdb_table_function duckdb_create_table_function(
  
);

duckdb_destroy_table_function

Destroys the given table function object.

Syntax
void duckdb_destroy_table_function(
  duckdb_table_function *table_function
);
Parameters
  • table_function: The table function to destroy

duckdb_table_function_set_name

Sets the name of the given table function.

Syntax
void duckdb_table_function_set_name(
  duckdb_table_function table_function,
  const char *name
);
Parameters
  • table_function: The table function
  • name: The name of the table function

duckdb_table_function_add_parameter

Adds a parameter to the table function.

Syntax
void duckdb_table_function_add_parameter(
  duckdb_table_function table_function,
  duckdb_logical_type type
);
Parameters
  • table_function: The table function.
  • type: The parameter type. Cannot contain INVALID.

duckdb_table_function_add_named_parameter

Adds a named parameter to the table function.

Syntax
void duckdb_table_function_add_named_parameter(
  duckdb_table_function table_function,
  const char *name,
  duckdb_logical_type type
);
Parameters
  • table_function: The table function.
  • name: The parameter name.
  • type: The parameter type. Cannot contain INVALID.

duckdb_table_function_set_extra_info

Assigns extra information to the table function that can be fetched during binding, etc.

Syntax
void duckdb_table_function_set_extra_info(
  duckdb_table_function table_function,
  void *extra_info,
  duckdb_delete_callback_t destroy
);
Parameters
  • table_function: The table function
  • extra_info: The extra information
  • destroy: The callback that will be called to destroy the bind data (if any)

duckdb_table_function_set_bind

Sets the bind function of the table function.

Syntax
void duckdb_table_function_set_bind(
  duckdb_table_function table_function,
  duckdb_table_function_bind_t bind
);
Parameters
  • table_function: The table function
  • bind: The bind function

duckdb_table_function_set_init

Sets the init function of the table function.

Syntax
void duckdb_table_function_set_init(
  duckdb_table_function table_function,
  duckdb_table_function_init_t init
);
Parameters
  • table_function: The table function
  • init: The init function

duckdb_table_function_set_local_init

Sets the thread-local init function of the table function.

Syntax
void duckdb_table_function_set_local_init(
  duckdb_table_function table_function,
  duckdb_table_function_init_t init
);
Parameters
  • table_function: The table function
  • init: The init function

duckdb_table_function_set_function

Sets the main function of the table function.

Syntax
void duckdb_table_function_set_function(
  duckdb_table_function table_function,
  duckdb_table_function_t function
);
Parameters
  • table_function: The table function
  • function: The function

duckdb_table_function_supports_projection_pushdown

Sets whether or not the given table function supports projection pushdown.

If this is set to true, the system will provide a list of all required columns in the init stage through the duckdb_init_get_column_count and duckdb_init_get_column_index functions. If this is set to false (the default), the system will expect all columns to be projected.

Syntax
void duckdb_table_function_supports_projection_pushdown(
  duckdb_table_function table_function,
  bool pushdown
);
Parameters
  • table_function: The table function
  • pushdown: True if the table function supports projection pushdown, false otherwise.

duckdb_register_table_function

Register the table function object within the given connection.

The function requires at least a name, a bind function, an init function and a main function.

If the function is incomplete or a function with this name already exists DuckDBError is returned.

Syntax
duckdb_state duckdb_register_table_function(
  duckdb_connection con,
  duckdb_table_function function
);
Parameters
  • con: The connection to register it in.
  • function: The function pointer
Return Value

Whether or not the registration was successful.


duckdb_bind_get_extra_info

Retrieves the extra info of the function as set in duckdb_table_function_set_extra_info.

Syntax
void *duckdb_bind_get_extra_info(
  duckdb_bind_info info
);
Parameters
  • info: The info object
Return Value

The extra info


duckdb_bind_add_result_column

Adds a result column to the output of the table function.

Syntax
void duckdb_bind_add_result_column(
  duckdb_bind_info info,
  const char *name,
  duckdb_logical_type type
);
Parameters
  • info: The table function's bind info.
  • name: The column name.
  • type: The logical column type.

duckdb_bind_get_parameter_count

Retrieves the number of regular (non-named) parameters to the function.

Syntax
idx_t duckdb_bind_get_parameter_count(
  duckdb_bind_info info
);
Parameters
  • info: The info object
Return Value

The number of parameters


duckdb_bind_get_parameter

Retrieves the parameter at the given index.

The result must be destroyed with duckdb_destroy_value.

Syntax
duckdb_value duckdb_bind_get_parameter(
  duckdb_bind_info info,
  idx_t index
);
Parameters
  • info: The info object
  • index: The index of the parameter to get
Return Value

The value of the parameter. Must be destroyed with duckdb_destroy_value.


duckdb_bind_get_named_parameter

Retrieves a named parameter with the given name.

The result must be destroyed with duckdb_destroy_value.

Syntax
duckdb_value duckdb_bind_get_named_parameter(
  duckdb_bind_info info,
  const char *name
);
Parameters
  • info: The info object
  • name: The name of the parameter
Return Value

The value of the parameter. Must be destroyed with duckdb_destroy_value.


duckdb_bind_set_bind_data

Sets the user-provided bind data in the bind object. This object can be retrieved again during execution.

Syntax
void duckdb_bind_set_bind_data(
  duckdb_bind_info info,
  void *bind_data,
  duckdb_delete_callback_t destroy
);
Parameters
  • info: The info object
  • bind_data: The bind data object.
  • destroy: The callback that will be called to destroy the bind data (if any)

duckdb_bind_set_cardinality

Sets the cardinality estimate for the table function, used for optimization.

Syntax
void duckdb_bind_set_cardinality(
  duckdb_bind_info info,
  idx_t cardinality,
  bool is_exact
);
Parameters
  • info: The bind data object.
  • is_exact: Whether or not the cardinality estimate is exact, or an approximation

duckdb_bind_set_error

Report that an error has occurred while calling bind.

Syntax
void duckdb_bind_set_error(
  duckdb_bind_info info,
  const char *error
);
Parameters
  • info: The info object
  • error: The error message

duckdb_init_get_extra_info

Retrieves the extra info of the function as set in duckdb_table_function_set_extra_info.

Syntax
void *duckdb_init_get_extra_info(
  duckdb_init_info info
);
Parameters
  • info: The info object
Return Value

The extra info


duckdb_init_get_bind_data

Gets the bind data set by duckdb_bind_set_bind_data during the bind.

Note that the bind data should be considered as read-only. For tracking state, use the init data instead.

Syntax
void *duckdb_init_get_bind_data(
  duckdb_init_info info
);
Parameters
  • info: The info object
Return Value

The bind data object


duckdb_init_set_init_data

Sets the user-provided init data in the init object. This object can be retrieved again during execution.

Syntax
void duckdb_init_set_init_data(
  duckdb_init_info info,
  void *init_data,
  duckdb_delete_callback_t destroy
);
Parameters
  • info: The info object
  • init_data: The init data object.
  • destroy: The callback that will be called to destroy the init data (if any)

duckdb_init_get_column_count

Returns the number of projected columns.

This function must be used if projection pushdown is enabled to figure out which columns to emit.

Syntax
idx_t duckdb_init_get_column_count(
  duckdb_init_info info
);
Parameters
  • info: The info object
Return Value

The number of projected columns.


duckdb_init_get_column_index

Returns the column index of the projected column at the specified position.

This function must be used if projection pushdown is enabled to figure out which columns to emit.

Syntax
idx_t duckdb_init_get_column_index(
  duckdb_init_info info,
  idx_t column_index
);
Parameters
  • info: The info object
  • column_index: The index at which to get the projected column index, from 0..duckdb_init_get_column_count(info)
Return Value

The column index of the projected column.


duckdb_init_set_max_threads

Sets how many threads can process this table function in parallel (default: 1)

Syntax
void duckdb_init_set_max_threads(
  duckdb_init_info info,
  idx_t max_threads
);
Parameters
  • info: The info object
  • max_threads: The maximum amount of threads that can process this table function

duckdb_init_set_error

Report that an error has occurred while calling init.

Syntax
void duckdb_init_set_error(
  duckdb_init_info info,
  const char *error
);
Parameters
  • info: The info object
  • error: The error message

duckdb_function_get_extra_info

Retrieves the extra info of the function as set in duckdb_table_function_set_extra_info.

Syntax
void *duckdb_function_get_extra_info(
  duckdb_function_info info
);
Parameters
  • info: The info object
Return Value

The extra info


duckdb_function_get_bind_data

Gets the bind data set by duckdb_bind_set_bind_data during the bind.

Note that the bind data should be considered as read-only. For tracking state, use the init data instead.

Syntax
void *duckdb_function_get_bind_data(
  duckdb_function_info info
);
Parameters
  • info: The info object
Return Value

The bind data object


duckdb_function_get_init_data

Gets the init data set by duckdb_init_set_init_data during the init.

Syntax
void *duckdb_function_get_init_data(
  duckdb_function_info info
);
Parameters
  • info: The info object
Return Value

The init data object


duckdb_function_get_local_init_data

Gets the thread-local init data set by duckdb_init_set_init_data during the local_init.

Syntax
void *duckdb_function_get_local_init_data(
  duckdb_function_info info
);
Parameters
  • info: The info object
Return Value

The init data object


duckdb_function_set_error

Report that an error has occurred while executing the function.

Syntax
void duckdb_function_set_error(
  duckdb_function_info info,
  const char *error
);
Parameters
  • info: The info object
  • error: The error message

--- layout: docu title: Startup & Shutdown redirect_from: - /docs/api/c/connect - /docs/api/c/connect/ ---

To use DuckDB, you must first initialize a duckdb_database handle using duckdb_open(). duckdb_open() takes as parameter the database file to read and write from. The special value NULL (nullptr) can be used to create an in-memory database. Note that for an in-memory database no data is persisted to disk (i.e., all data is lost when you exit the process).

With the duckdb_database handle, you can create one or many duckdb_connection using duckdb_connect(). While individual connections are thread-safe, they will be locked during querying. It is therefore recommended that each thread uses its own connection to allow for the best parallel performance.

All duckdb_connections have to explicitly be disconnected with duckdb_disconnect() and the duckdb_database has to be explicitly closed with duckdb_close() to avoid memory and file handle leaking.

Example

duckdb_database db;
duckdb_connection con;

if (duckdb_open(NULL, &db) == DuckDBError) {
    // handle error
}
if (duckdb_connect(db, &con) == DuckDBError) {
    // handle error
}

// run queries...

// cleanup
duckdb_disconnect(&con);
duckdb_close(&db);

API Reference Overview

duckdb_instance_cache duckdb_create_instance_cache();
duckdb_state duckdb_get_or_create_from_cache(duckdb_instance_cache instance_cache, const char *path, duckdb_database *out_database, duckdb_config config, char **out_error);
void duckdb_destroy_instance_cache(duckdb_instance_cache *instance_cache);
duckdb_state duckdb_open(const char *path, duckdb_database *out_database);
duckdb_state duckdb_open_ext(const char *path, duckdb_database *out_database, duckdb_config config, char **out_error);
void duckdb_close(duckdb_database *database);
duckdb_state duckdb_connect(duckdb_database database, duckdb_connection *out_connection);
void duckdb_interrupt(duckdb_connection connection);
duckdb_query_progress_type duckdb_query_progress(duckdb_connection connection);
void duckdb_disconnect(duckdb_connection *connection);
const char *duckdb_library_version();

duckdb_create_instance_cache

Creates a new database instance cache. The instance cache is necessary if a client/program (re)opens multiple databases to the same file within the same process. Must be destroyed with 'duckdb_destroy_instance_cache'.

Return Value

The database instance cache.

Syntax
duckdb_instance_cache duckdb_create_instance_cache(
  
);

duckdb_get_or_create_from_cache

Creates a new database instance in the instance cache, or retrieves an existing database instance. Must be closed with 'duckdb_close'.

Syntax
duckdb_state duckdb_get_or_create_from_cache(
  duckdb_instance_cache instance_cache,
  const char *path,
  duckdb_database *out_database,
  duckdb_config config,
  char **out_error
);
Parameters
  • instance_cache: The instance cache in which to create the database, or from which to take the database.
  • path: Path to the database file on disk. Both nullptr and :memory: open or retrieve an in-memory database.
  • out_database: The resulting cached database.
  • config: (Optional) configuration used to create the database.
  • out_error: If set and the function returns DuckDBError, this contains the error message. Note that the error message must be freed using duckdb_free.
Return Value

DuckDBSuccess on success or DuckDBError on failure.


duckdb_destroy_instance_cache

Destroys an existing database instance cache and de-allocates its memory.

Syntax
void duckdb_destroy_instance_cache(
  duckdb_instance_cache *instance_cache
);
Parameters
  • instance_cache: The instance cache to destroy.

duckdb_open

Creates a new database or opens an existing database file stored at the given path. If no path is given a new in-memory database is created instead. The database must be closed with 'duckdb_close'.

Syntax
duckdb_state duckdb_open(
  const char *path,
  duckdb_database *out_database
);
Parameters
  • path: Path to the database file on disk. Both nullptr and :memory: open an in-memory database.
  • out_database: The result database object.
Return Value

DuckDBSuccess on success or DuckDBError on failure.


duckdb_open_ext

Extended version of duckdb_open. Creates a new database or opens an existing database file stored at the given path. The database must be closed with 'duckdb_close'.

Syntax
duckdb_state duckdb_open_ext(
  const char *path,
  duckdb_database *out_database,
  duckdb_config config,
  char **out_error
);
Parameters
  • path: Path to the database file on disk. Both nullptr and :memory: open an in-memory database.
  • out_database: The result database object.
  • config: (Optional) configuration used to start up the database.
  • out_error: If set and the function returns DuckDBError, this contains the error message. Note that the error message must be freed using duckdb_free.
Return Value

DuckDBSuccess on success or DuckDBError on failure.


duckdb_close

Closes the specified database and de-allocates all memory allocated for that database. This should be called after you are done with any database allocated through duckdb_open or duckdb_open_ext. Note that failing to call duckdb_close (in case of e.g., a program crash) will not cause data corruption. Still, it is recommended to always correctly close a database object after you are done with it.

Syntax
void duckdb_close(
  duckdb_database *database
);
Parameters
  • database: The database object to shut down.

duckdb_connect

Opens a connection to a database. Connections are required to query the database, and store transactional state associated with the connection. The instantiated connection should be closed using 'duckdb_disconnect'.

Syntax
duckdb_state duckdb_connect(
  duckdb_database database,
  duckdb_connection *out_connection
);
Parameters
  • database: The database file to connect to.
  • out_connection: The result connection object.
Return Value

DuckDBSuccess on success or DuckDBError on failure.


duckdb_interrupt

Interrupt running query

Syntax
void duckdb_interrupt(
  duckdb_connection connection
);
Parameters
  • connection: The connection to interrupt

duckdb_query_progress

Get progress of the running query

Syntax
duckdb_query_progress_type duckdb_query_progress(
  duckdb_connection connection
);
Parameters
  • connection: The working connection
Return Value

-1 if no progress or a percentage of the progress


duckdb_disconnect

Closes the specified connection and de-allocates all memory allocated for that connection.

Syntax
void duckdb_disconnect(
  duckdb_connection *connection
);
Parameters
  • connection: The connection to close.

duckdb_library_version

Returns the version of the linked DuckDB, with a version postfix for dev versions

Usually used for developing C extensions that must return this for a compatibility check.

Syntax
const char *duckdb_library_version(
  
);

--- layout: docu title: Vectors redirect_from: - /docs/api/c/vector - /docs/api/c/vector/ ---

Vectors represent a horizontal slice of a column. They hold a number of values of a specific type, similar to an array. Vectors are the core data representation used in DuckDB. Vectors are typically stored within [data chunks]({% link docs/clients/c/data_chunk.md %}).

The vector and data chunk interfaces are the most efficient way of interacting with DuckDB, allowing for the highest performance. However, the interfaces are also difficult to use and care must be taken when using them.

Vector Format

Vectors are arrays of a specific data type. The logical type of a vector can be obtained using duckdb_vector_get_column_type. The type id of the logical type can then be obtained using duckdb_get_type_id.

Vectors themselves do not have sizes. Instead, the parent data chunk has a size (that can be obtained through duckdb_data_chunk_get_size). All vectors that belong to a data chunk have the same size.

Primitive Types

For primitive types, the underlying array can be obtained using the duckdb_vector_get_data method. The array can then be accessed using the correct native type. Below is a table that contains a mapping of the duckdb_type to the native type of the array.

duckdb_type NativeType
DUCKDB_TYPE_BOOLEAN bool
DUCKDB_TYPE_TINYINT int8_t
DUCKDB_TYPE_SMALLINT int16_t
DUCKDB_TYPE_INTEGER int32_t
DUCKDB_TYPE_BIGINT int64_t
DUCKDB_TYPE_UTINYINT uint8_t
DUCKDB_TYPE_USMALLINT uint16_t
DUCKDB_TYPE_UINTEGER uint32_t
DUCKDB_TYPE_UBIGINT uint64_t
DUCKDB_TYPE_FLOAT float
DUCKDB_TYPE_DOUBLE double
DUCKDB_TYPE_TIMESTAMP duckdb_timestamp
DUCKDB_TYPE_DATE duckdb_date
DUCKDB_TYPE_TIME duckdb_time
DUCKDB_TYPE_INTERVAL duckdb_interval
DUCKDB_TYPE_HUGEINT duckdb_hugeint
DUCKDB_TYPE_UHUGEINT duckdb_uhugeint
DUCKDB_TYPE_VARCHAR duckdb_string_t
DUCKDB_TYPE_BLOB duckdb_string_t
DUCKDB_TYPE_TIMESTAMP_S duckdb_timestamp
DUCKDB_TYPE_TIMESTAMP_MS duckdb_timestamp
DUCKDB_TYPE_TIMESTAMP_NS duckdb_timestamp
DUCKDB_TYPE_UUID duckdb_hugeint
DUCKDB_TYPE_TIME_TZ duckdb_time_tz
DUCKDB_TYPE_TIMESTAMP_TZ duckdb_timestamp

NULL Values

Any value in a vector can be NULL. When a value is NULL, the values contained within the primary array at that index is undefined (and can be uninitialized). The validity mask is a bitmask consisting of uint64_t elements. For every 64 values in the vector, one uint64_t element exists (rounded up). The validity mask has its bit set to 1 if the value is valid, or set to 0 if the value is invalid (i.e .NULL).

The bits of the bitmask can be read directly, or the slower helper method duckdb_validity_row_is_valid can be used to check whether or not a value is NULL.

The duckdb_vector_get_validity returns a pointer to the validity mask. Note that if all values in a vector are valid, this function might return nullptr in which case the validity mask does not need to be checked.

Strings

String values are stored as a duckdb_string_t. This is a special struct that stores the string inline (if it is short, i.e., <= 12 bytes) or a pointer to the string data if it is longer than 12 bytes.

typedef struct {
	union {
		struct {
			uint32_t length;
			char prefix[4];
			char *ptr;
		} pointer;
		struct {
			uint32_t length;
			char inlined[12];
		} inlined;
	} value;
} duckdb_string_t;

The length can either be accessed directly, or the duckdb_string_is_inlined can be used to check if a string is inlined.

Decimals

Decimals are stored as integer values internally. The exact native type depends on the width of the decimal type, as shown in the following table:

Width NativeType
<= 4 int16_t
<= 9 int32_t
<= 18 int64_t
<= 38 duckdb_hugeint

The duckdb_decimal_internal_type can be used to obtain the internal type of the decimal.

Decimals are stored as integer values multiplied by 10^scale. The scale of a decimal can be obtained using duckdb_decimal_scale. For example, a decimal value of 10.5 with type DECIMAL(8, 3) is stored internally as an int32_t value of 10500. In order to obtain the correct decimal value, the value should be divided by the appropriate power-of-ten.

Enums

Enums are stored as unsigned integer values internally. The exact native type depends on the size of the enum dictionary, as shown in the following table:

Dictionary size NativeType
<= 255 uint8_t
<= 65535 uint16_t
<= 4294967295 uint32_t

The duckdb_enum_internal_type can be used to obtain the internal type of the enum.

In order to obtain the actual string value of the enum, the duckdb_enum_dictionary_value function must be used to obtain the enum value that corresponds to the given dictionary entry. Note that the enum dictionary is the same for the entire column – and so only needs to be constructed once.

Structs

Structs are nested types that contain any number of child types. Think of them like a struct in C. The way to access struct data using vectors is to access the child vectors recursively using the duckdb_struct_vector_get_child method.

The struct vector itself does not have any data (i.e., you should not use duckdb_vector_get_data method on the struct). However, the struct vector itself does have a validity mask. The reason for this is that the child elements of a struct can be NULL, but the struct itself can also be NULL.

Lists

Lists are nested types that contain a single child type, repeated x times per row. Think of them like a variable-length array in C. The way to access list data using vectors is to access the child vector using the duckdb_list_vector_get_child method.

The duckdb_vector_get_data must be used to get the offsets and lengths of the lists stored as duckdb_list_entry, that can then be applied to the child vector.

typedef struct {
	uint64_t offset;
	uint64_t length;
} duckdb_list_entry;

Note that both list entries itself and any children stored in the lists can also be NULL. This must be checked using the validity mask again.

Arrays

Arrays are nested types that contain a single child type, repeated exactly array_size times per row. Think of them like a fixed-size array in C. Arrays work exactly the same as lists, except the length and offset of each entry is fixed. The fixed array size can be obtained by using duckdb_array_type_array_size. The data for entry n then resides at offset = n * array_size, and always has length = array_size.

Note that much like lists, arrays can still be NULL, which must be checked using the validity mask.

Examples

Below are several full end-to-end examples of how to interact with vectors.

Example: Reading an int64 Vector with NULL Values

duckdb_database db;
duckdb_connection con;
duckdb_open(nullptr, &db);
duckdb_connect(db, &con);

duckdb_result res;
duckdb_query(con, "SELECT CASE WHEN i%2=0 THEN NULL ELSE i END res_col FROM range(10) t(i)", &res);

// iterate until result is exhausted
while (true) {
	duckdb_data_chunk result = duckdb_fetch_chunk(res);
	if (!result) {
		// result is exhausted
		break;
	}
	// get the number of rows from the data chunk
	idx_t row_count = duckdb_data_chunk_get_size(result);
	// get the first column
	duckdb_vector res_col = duckdb_data_chunk_get_vector(result, 0);
	// get the native array and the validity mask of the vector
	int64_t *vector_data = (int64_t *) duckdb_vector_get_data(res_col);
	uint64_t *vector_validity = duckdb_vector_get_validity(res_col);
	// iterate over the rows
	for (idx_t row = 0; row < row_count; row++) {
		if (duckdb_validity_row_is_valid(vector_validity, row)) {
			printf("%lld\n", vector_data[row]);
		} else {
			printf("NULL\n");
		}
	}
	duckdb_destroy_data_chunk(&result);
}
// clean-up
duckdb_destroy_result(&res);
duckdb_disconnect(&con);
duckdb_close(&db);

Example: Reading a String Vector

duckdb_database db;
duckdb_connection con;
duckdb_open(nullptr, &db);
duckdb_connect(db, &con);

duckdb_result res;
duckdb_query(con, "SELECT CASE WHEN i%2=0 THEN CONCAT('short_', i) ELSE CONCAT('longstringprefix', i) END FROM range(10) t(i)", &res);

// iterate until result is exhausted
while (true) {
	duckdb_data_chunk result = duckdb_fetch_chunk(res);
	if (!result) {
		// result is exhausted
		break;
	}
	// get the number of rows from the data chunk
	idx_t row_count = duckdb_data_chunk_get_size(result);
	// get the first column
	duckdb_vector res_col = duckdb_data_chunk_get_vector(result, 0);
	// get the native array and the validity mask of the vector
	duckdb_string_t *vector_data = (duckdb_string_t *) duckdb_vector_get_data(res_col);
	uint64_t *vector_validity = duckdb_vector_get_validity(res_col);
	// iterate over the rows
	for (idx_t row = 0; row < row_count; row++) {
		if (duckdb_validity_row_is_valid(vector_validity, row)) {
			duckdb_string_t str = vector_data[row];
			if (duckdb_string_is_inlined(str)) {
				// use inlined string
				printf("%.*s\n", str.value.inlined.length, str.value.inlined.inlined);
			} else {
				// follow string pointer
				printf("%.*s\n", str.value.pointer.length, str.value.pointer.ptr);
			}
		} else {
			printf("NULL\n");
		}
	}
	duckdb_destroy_data_chunk(&result);
}
// clean-up
duckdb_destroy_result(&res);
duckdb_disconnect(&con);
duckdb_close(&db);

Example: Reading a Struct Vector

duckdb_database db;
duckdb_connection con;
duckdb_open(nullptr, &db);
duckdb_connect(db, &con);

duckdb_result res;
duckdb_query(con, "SELECT CASE WHEN i%5=0 THEN NULL ELSE {'col1': i, 'col2': CASE WHEN i%2=0 THEN NULL ELSE 100 + i * 42 END} END FROM range(10) t(i)", &res);

// iterate until result is exhausted
while (true) {
	duckdb_data_chunk result = duckdb_fetch_chunk(res);
	if (!result) {
		// result is exhausted
		break;
	}
	// get the number of rows from the data chunk
	idx_t row_count = duckdb_data_chunk_get_size(result);
	// get the struct column
	duckdb_vector struct_col = duckdb_data_chunk_get_vector(result, 0);
	uint64_t *struct_validity = duckdb_vector_get_validity(struct_col);
	// get the child columns of the struct
	duckdb_vector col1_vector = duckdb_struct_vector_get_child(struct_col, 0);
	int64_t *col1_data = (int64_t *) duckdb_vector_get_data(col1_vector);
	uint64_t *col1_validity = duckdb_vector_get_validity(col1_vector);

	duckdb_vector col2_vector = duckdb_struct_vector_get_child(struct_col, 1);
	int64_t *col2_data = (int64_t *) duckdb_vector_get_data(col2_vector);
	uint64_t *col2_validity = duckdb_vector_get_validity(col2_vector);

	// iterate over the rows
	for (idx_t row = 0; row < row_count; row++) {
		if (!duckdb_validity_row_is_valid(struct_validity, row)) {
			// entire struct is NULL
			printf("NULL\n");
			continue;
		}
		// read col1
		printf("{'col1': ");
		if (!duckdb_validity_row_is_valid(col1_validity, row)) {
			// col1 is NULL
			printf("NULL");
		} else {
			printf("%lld", col1_data[row]);
		}
		printf(", 'col2': ");
		if (!duckdb_validity_row_is_valid(col2_validity, row)) {
			// col2 is NULL
			printf("NULL");
		} else {
			printf("%lld", col2_data[row]);
		}
		printf("}\n");
	}
	duckdb_destroy_data_chunk(&result);
}
// clean-up
duckdb_destroy_result(&res);
duckdb_disconnect(&con);
duckdb_close(&db);

Example: Reading a List Vector

duckdb_database db;
duckdb_connection con;
duckdb_open(nullptr, &db);
duckdb_connect(db, &con);

duckdb_result res;
duckdb_query(con, "SELECT CASE WHEN i % 5 = 0 THEN NULL WHEN i % 2 = 0 THEN [i, i + 1] ELSE [i * 42, NULL, i * 84] END FROM range(10) t(i)", &res);

// iterate until result is exhausted
while (true) {
	duckdb_data_chunk result = duckdb_fetch_chunk(res);
	if (!result) {
		// result is exhausted
		break;
	}
	// get the number of rows from the data chunk
	idx_t row_count = duckdb_data_chunk_get_size(result);
	// get the list column
	duckdb_vector list_col = duckdb_data_chunk_get_vector(result, 0);
	duckdb_list_entry *list_data = (duckdb_list_entry *) duckdb_vector_get_data(list_col);
	uint64_t *list_validity = duckdb_vector_get_validity(list_col);
	// get the child column of the list
	duckdb_vector list_child = duckdb_list_vector_get_child(list_col);
	int64_t *child_data = (int64_t *) duckdb_vector_get_data(list_child);
	uint64_t *child_validity = duckdb_vector_get_validity(list_child);

	// iterate over the rows
	for (idx_t row = 0; row < row_count; row++) {
		if (!duckdb_validity_row_is_valid(list_validity, row)) {
			// entire list is NULL
			printf("NULL\n");
			continue;
		}
		// read the list offsets for this row
		duckdb_list_entry list = list_data[row];
		printf("[");
		for (idx_t child_idx = list.offset; child_idx < list.offset + list.length; child_idx++) {
			if (child_idx > list.offset) {
				printf(", ");
			}
			if (!duckdb_validity_row_is_valid(child_validity, child_idx)) {
				// col1 is NULL
				printf("NULL");
			} else {
				printf("%lld", child_data[child_idx]);
			}
		}
		printf("]\n");
	}
	duckdb_destroy_data_chunk(&result);
}
// clean-up
duckdb_destroy_result(&res);
duckdb_disconnect(&con);
duckdb_close(&db);

API Reference Overview

duckdb_logical_type duckdb_vector_get_column_type(duckdb_vector vector);
void *duckdb_vector_get_data(duckdb_vector vector);
uint64_t *duckdb_vector_get_validity(duckdb_vector vector);
void duckdb_vector_ensure_validity_writable(duckdb_vector vector);
void duckdb_vector_assign_string_element(duckdb_vector vector, idx_t index, const char *str);
void duckdb_vector_assign_string_element_len(duckdb_vector vector, idx_t index, const char *str, idx_t str_len);
duckdb_vector duckdb_list_vector_get_child(duckdb_vector vector);
idx_t duckdb_list_vector_get_size(duckdb_vector vector);
duckdb_state duckdb_list_vector_set_size(duckdb_vector vector, idx_t size);
duckdb_state duckdb_list_vector_reserve(duckdb_vector vector, idx_t required_capacity);
duckdb_vector duckdb_struct_vector_get_child(duckdb_vector vector, idx_t index);
duckdb_vector duckdb_array_vector_get_child(duckdb_vector vector);

Validity Mask Functions

bool duckdb_validity_row_is_valid(uint64_t *validity, idx_t row);
void duckdb_validity_set_row_validity(uint64_t *validity, idx_t row, bool valid);
void duckdb_validity_set_row_invalid(uint64_t *validity, idx_t row);
void duckdb_validity_set_row_valid(uint64_t *validity, idx_t row);

duckdb_vector_get_column_type

Retrieves the column type of the specified vector.

The result must be destroyed with duckdb_destroy_logical_type.

Syntax
duckdb_logical_type duckdb_vector_get_column_type(
  duckdb_vector vector
);
Parameters
  • vector: The vector get the data from
Return Value

The type of the vector


duckdb_vector_get_data

Retrieves the data pointer of the vector.

The data pointer can be used to read or write values from the vector. How to read or write values depends on the type of the vector.

Syntax
void *duckdb_vector_get_data(
  duckdb_vector vector
);
Parameters
  • vector: The vector to get the data from
Return Value

The data pointer


duckdb_vector_get_validity

Retrieves the validity mask pointer of the specified vector.

If all values are valid, this function MIGHT return NULL!

The validity mask is a bitset that signifies null-ness within the data chunk. It is a series of uint64_t values, where each uint64_t value contains validity for 64 tuples. The bit is set to 1 if the value is valid (i.e., not NULL) or 0 if the value is invalid (i.e., NULL).

Validity of a specific value can be obtained like this:

idx_t entry_idx = row_idx / 64; idx_t idx_in_entry = row_idx % 64; bool is_valid = validity_mask[entry_idx] & (1 << idx_in_entry);

Alternatively, the (slower) duckdb_validity_row_is_valid function can be used.

Syntax
uint64_t *duckdb_vector_get_validity(
  duckdb_vector vector
);
Parameters
  • vector: The vector to get the data from
Return Value

The pointer to the validity mask, or NULL if no validity mask is present


duckdb_vector_ensure_validity_writable

Ensures the validity mask is writable by allocating it.

After this function is called, duckdb_vector_get_validity will ALWAYS return non-NULL. This allows NULL values to be written to the vector, regardless of whether a validity mask was present before.

Syntax
void duckdb_vector_ensure_validity_writable(
  duckdb_vector vector
);
Parameters
  • vector: The vector to alter

duckdb_vector_assign_string_element

Assigns a string element in the vector at the specified location.

Syntax
void duckdb_vector_assign_string_element(
  duckdb_vector vector,
  idx_t index,
  const char *str
);
Parameters
  • vector: The vector to alter
  • index: The row position in the vector to assign the string to
  • str: The null-terminated string

duckdb_vector_assign_string_element_len

Assigns a string element in the vector at the specified location. You may also use this function to assign BLOBs.

Syntax
void duckdb_vector_assign_string_element_len(
  duckdb_vector vector,
  idx_t index,
  const char *str,
  idx_t str_len
);
Parameters
  • vector: The vector to alter
  • index: The row position in the vector to assign the string to
  • str: The string
  • str_len: The length of the string (in bytes)

duckdb_list_vector_get_child

Retrieves the child vector of a list vector.

The resulting vector is valid as long as the parent vector is valid.

Syntax
duckdb_vector duckdb_list_vector_get_child(
  duckdb_vector vector
);
Parameters
  • vector: The vector
Return Value

The child vector


duckdb_list_vector_get_size

Returns the size of the child vector of the list.

Syntax
idx_t duckdb_list_vector_get_size(
  duckdb_vector vector
);
Parameters
  • vector: The vector
Return Value

The size of the child list


duckdb_list_vector_set_size

Sets the total size of the underlying child-vector of a list vector.

Syntax
duckdb_state duckdb_list_vector_set_size(
  duckdb_vector vector,
  idx_t size
);
Parameters
  • vector: The list vector.
  • size: The size of the child list.
Return Value

The duckdb state. Returns DuckDBError if the vector is nullptr.


duckdb_list_vector_reserve

Sets the total capacity of the underlying child-vector of a list.

After calling this method, you must call duckdb_vector_get_validity and duckdb_vector_get_data to obtain current data and validity pointers

Syntax
duckdb_state duckdb_list_vector_reserve(
  duckdb_vector vector,
  idx_t required_capacity
);
Parameters
  • vector: The list vector.
  • required_capacity: the total capacity to reserve.
Return Value

The duckdb state. Returns DuckDBError if the vector is nullptr.


duckdb_struct_vector_get_child

Retrieves the child vector of a struct vector.

The resulting vector is valid as long as the parent vector is valid.

Syntax
duckdb_vector duckdb_struct_vector_get_child(
  duckdb_vector vector,
  idx_t index
);
Parameters
  • vector: The vector
  • index: The child index
Return Value

The child vector


duckdb_array_vector_get_child

Retrieves the child vector of a array vector.

The resulting vector is valid as long as the parent vector is valid. The resulting vector has the size of the parent vector multiplied by the array size.

Syntax
duckdb_vector duckdb_array_vector_get_child(
  duckdb_vector vector
);
Parameters
  • vector: The vector
Return Value

The child vector


duckdb_validity_row_is_valid

Returns whether or not a row is valid (i.e., not NULL) in the given validity mask.

Syntax
bool duckdb_validity_row_is_valid(
  uint64_t *validity,
  idx_t row
);
Parameters
  • validity: The validity mask, as obtained through duckdb_vector_get_validity
  • row: The row index
Return Value

true if the row is valid, false otherwise


duckdb_validity_set_row_validity

In a validity mask, sets a specific row to either valid or invalid.

Note that duckdb_vector_ensure_validity_writable should be called before calling duckdb_vector_get_validity, to ensure that there is a validity mask to write to.

Syntax
void duckdb_validity_set_row_validity(
  uint64_t *validity,
  idx_t row,
  bool valid
);
Parameters
  • validity: The validity mask, as obtained through duckdb_vector_get_validity.
  • row: The row index
  • valid: Whether or not to set the row to valid, or invalid

duckdb_validity_set_row_invalid

In a validity mask, sets a specific row to invalid.

Equivalent to duckdb_validity_set_row_validity with valid set to false.

Syntax
void duckdb_validity_set_row_invalid(
  uint64_t *validity,
  idx_t row
);
Parameters
  • validity: The validity mask
  • row: The row index

duckdb_validity_set_row_valid

In a validity mask, sets a specific row to valid.

Equivalent to duckdb_validity_set_row_validity with valid set to true.

Syntax
void duckdb_validity_set_row_valid(
  uint64_t *validity,
  idx_t row
);
Parameters
  • validity: The validity mask
  • row: The row index

--- layout: docu title: Appender redirect_from: - /docs/api/c/appender - /docs/api/c/appender/ ---

Appenders are the most efficient way of loading data into DuckDB from within the C interface, and are recommended for fast data loading. The appender is much faster than using prepared statements or individual INSERT INTO statements.

Appends are made in row-wise format. For every column, a duckdb_append_[type] call should be made, after which the row should be finished by calling duckdb_appender_end_row. After all rows have been appended, duckdb_appender_destroy should be used to finalize the appender and clean up the resulting memory.

Note that duckdb_appender_destroy should always be called on the resulting appender, even if the function returns DuckDBError.

Example

duckdb_query(con, "CREATE TABLE people (id INTEGER, name VARCHAR)", NULL);

duckdb_appender appender;
if (duckdb_appender_create(con, NULL, "people", &appender) == DuckDBError) {
  // handle error
}
// append the first row (1, Mark)
duckdb_append_int32(appender, 1);
duckdb_append_varchar(appender, "Mark");
duckdb_appender_end_row(appender);

// append the second row (2, Hannes)
duckdb_append_int32(appender, 2);
duckdb_append_varchar(appender, "Hannes");
duckdb_appender_end_row(appender);

// finish appending and flush all the rows to the table
duckdb_appender_destroy(&appender);

API Reference Overview

duckdb_state duckdb_appender_create(duckdb_connection connection, const char *schema, const char *table, duckdb_appender *out_appender);
duckdb_state duckdb_appender_create_ext(duckdb_connection connection, const char *catalog, const char *schema, const char *table, duckdb_appender *out_appender);
idx_t duckdb_appender_column_count(duckdb_appender appender);
duckdb_logical_type duckdb_appender_column_type(duckdb_appender appender, idx_t col_idx);
const char *duckdb_appender_error(duckdb_appender appender);
duckdb_state duckdb_appender_flush(duckdb_appender appender);
duckdb_state duckdb_appender_close(duckdb_appender appender);
duckdb_state duckdb_appender_destroy(duckdb_appender *appender);
duckdb_state duckdb_appender_add_column(duckdb_appender appender, const char *name);
duckdb_state duckdb_appender_clear_columns(duckdb_appender appender);
duckdb_state duckdb_appender_begin_row(duckdb_appender appender);
duckdb_state duckdb_appender_end_row(duckdb_appender appender);
duckdb_state duckdb_append_default(duckdb_appender appender);
duckdb_state duckdb_append_default_to_chunk(duckdb_appender appender, duckdb_data_chunk chunk, idx_t col, idx_t row);
duckdb_state duckdb_append_bool(duckdb_appender appender, bool value);
duckdb_state duckdb_append_int8(duckdb_appender appender, int8_t value);
duckdb_state duckdb_append_int16(duckdb_appender appender, int16_t value);
duckdb_state duckdb_append_int32(duckdb_appender appender, int32_t value);
duckdb_state duckdb_append_int64(duckdb_appender appender, int64_t value);
duckdb_state duckdb_append_hugeint(duckdb_appender appender, duckdb_hugeint value);
duckdb_state duckdb_append_uint8(duckdb_appender appender, uint8_t value);
duckdb_state duckdb_append_uint16(duckdb_appender appender, uint16_t value);
duckdb_state duckdb_append_uint32(duckdb_appender appender, uint32_t value);
duckdb_state duckdb_append_uint64(duckdb_appender appender, uint64_t value);
duckdb_state duckdb_append_uhugeint(duckdb_appender appender, duckdb_uhugeint value);
duckdb_state duckdb_append_float(duckdb_appender appender, float value);
duckdb_state duckdb_append_double(duckdb_appender appender, double value);
duckdb_state duckdb_append_date(duckdb_appender appender, duckdb_date value);
duckdb_state duckdb_append_time(duckdb_appender appender, duckdb_time value);
duckdb_state duckdb_append_timestamp(duckdb_appender appender, duckdb_timestamp value);
duckdb_state duckdb_append_interval(duckdb_appender appender, duckdb_interval value);
duckdb_state duckdb_append_varchar(duckdb_appender appender, const char *val);
duckdb_state duckdb_append_varchar_length(duckdb_appender appender, const char *val, idx_t length);
duckdb_state duckdb_append_blob(duckdb_appender appender, const void *data, idx_t length);
duckdb_state duckdb_append_null(duckdb_appender appender);
duckdb_state duckdb_append_value(duckdb_appender appender, duckdb_value value);
duckdb_state duckdb_append_data_chunk(duckdb_appender appender, duckdb_data_chunk chunk);

duckdb_appender_create

Creates an appender object.

Note that the object must be destroyed with duckdb_appender_destroy.

Syntax
duckdb_state duckdb_appender_create(
  duckdb_connection connection,
  const char *schema,
  const char *table,
  duckdb_appender *out_appender
);
Parameters
  • connection: The connection context to create the appender in.
  • schema: The schema of the table to append to, or nullptr for the default schema.
  • table: The table name to append to.
  • out_appender: The resulting appender object.
Return Value

DuckDBSuccess on success or DuckDBError on failure.


duckdb_appender_create_ext

Creates an appender object.

Note that the object must be destroyed with duckdb_appender_destroy.

Syntax
duckdb_state duckdb_appender_create_ext(
  duckdb_connection connection,
  const char *catalog,
  const char *schema,
  const char *table,
  duckdb_appender *out_appender
);
Parameters
  • connection: The connection context to create the appender in.
  • catalog: The catalog of the table to append to, or nullptr for the default catalog.
  • schema: The schema of the table to append to, or nullptr for the default schema.
  • table: The table name to append to.
  • out_appender: The resulting appender object.
Return Value

DuckDBSuccess on success or DuckDBError on failure.


duckdb_appender_column_count

Returns the number of columns that belong to the appender. If there is no active column list, then this equals the table's physical columns.

Syntax
idx_t duckdb_appender_column_count(
  duckdb_appender appender
);
Parameters
  • appender: The appender to get the column count from.
Return Value

The number of columns in the data chunks.


duckdb_appender_column_type

Returns the type of the column at the specified index. This is either a type in the active column list, or the same type as a column in the receiving table.

Note: The resulting type must be destroyed with duckdb_destroy_logical_type.

Syntax
duckdb_logical_type duckdb_appender_column_type(
  duckdb_appender appender,
  idx_t col_idx
);
Parameters
  • appender: The appender to get the column type from.
  • col_idx: The index of the column to get the type of.
Return Value

The duckdb_logical_type of the column.


duckdb_appender_error

Returns the error message associated with the given appender. If the appender has no error message, this returns nullptr instead.

The error message should not be freed. It will be de-allocated when duckdb_appender_destroy is called.

Syntax
const char *duckdb_appender_error(
  duckdb_appender appender
);
Parameters
  • appender: The appender to get the error from.
Return Value

The error message, or nullptr if there is none.


duckdb_appender_flush

Flush the appender to the table, forcing the cache of the appender to be cleared. If flushing the data triggers a constraint violation or any other error, then all data is invalidated, and this function returns DuckDBError. It is not possible to append more values. Call duckdb_appender_error to obtain the error message followed by duckdb_appender_destroy to destroy the invalidated appender.

Syntax
duckdb_state duckdb_appender_flush(
  duckdb_appender appender
);
Parameters
  • appender: The appender to flush.
Return Value

DuckDBSuccess on success or DuckDBError on failure.


duckdb_appender_close

Closes the appender by flushing all intermediate states and closing it for further appends. If flushing the data triggers a constraint violation or any other error, then all data is invalidated, and this function returns DuckDBError. Call duckdb_appender_error to obtain the error message followed by duckdb_appender_destroy to destroy the invalidated appender.

Syntax
duckdb_state duckdb_appender_close(
  duckdb_appender appender
);
Parameters
  • appender: The appender to flush and close.
Return Value

DuckDBSuccess on success or DuckDBError on failure.


duckdb_appender_destroy

Closes the appender by flushing all intermediate states to the table and destroying it. By destroying it, this function de-allocates all memory associated with the appender. If flushing the data triggers a constraint violation, then all data is invalidated, and this function returns DuckDBError. Due to the destruction of the appender, it is no longer possible to obtain the specific error message with duckdb_appender_error. Therefore, call duckdb_appender_close before destroying the appender, if you need insights into the specific error.

Syntax
duckdb_state duckdb_appender_destroy(
  duckdb_appender *appender
);
Parameters
  • appender: The appender to flush, close and destroy.
Return Value

DuckDBSuccess on success or DuckDBError on failure.


duckdb_appender_add_column

Appends a column to the active column list of the appender. Immediately flushes all previous data.

The active column list specifies all columns that are expected when flushing the data. Any non-active columns are filled with their default values, or NULL.

Syntax
duckdb_state duckdb_appender_add_column(
  duckdb_appender appender,
  const char *name
);
Parameters
  • appender: The appender to add the column to.
Return Value

DuckDBSuccess on success or DuckDBError on failure.


duckdb_appender_clear_columns

Removes all columns from the active column list of the appender, resetting the appender to treat all columns as active. Immediately flushes all previous data.

Syntax
duckdb_state duckdb_appender_clear_columns(
  duckdb_appender appender
);
Parameters
  • appender: The appender to clear the columns from.
Return Value

DuckDBSuccess on success or DuckDBError on failure.


duckdb_appender_begin_row

A nop function, provided for backwards compatibility reasons. Does nothing. Only duckdb_appender_end_row is required.

Syntax
duckdb_state duckdb_appender_begin_row(
  duckdb_appender appender
);

duckdb_appender_end_row

Finish the current row of appends. After end_row is called, the next row can be appended.

Syntax
duckdb_state duckdb_appender_end_row(
  duckdb_appender appender
);
Parameters
  • appender: The appender.
Return Value

DuckDBSuccess on success or DuckDBError on failure.


duckdb_append_default

Append a DEFAULT value (NULL if DEFAULT not available for column) to the appender.

Syntax
duckdb_state duckdb_append_default(
  duckdb_appender appender
);

duckdb_append_default_to_chunk

Append a DEFAULT value, at the specified row and column, (NULL if DEFAULT not available for column) to the chunk created from the specified appender. The default value of the column must be a constant value. Non-deterministic expressions like nextval('seq') or random() are not supported.

Syntax
duckdb_state duckdb_append_default_to_chunk(
  duckdb_appender appender,
  duckdb_data_chunk chunk,
  idx_t col,
  idx_t row
);
Parameters
  • appender: The appender to get the default value from.
  • chunk: The data chunk to append the default value to.
  • col: The chunk column index to append the default value to.
  • row: The chunk row index to append the default value to.
Return Value

DuckDBSuccess on success or DuckDBError on failure.


duckdb_append_bool

Append a bool value to the appender.

Syntax
duckdb_state duckdb_append_bool(
  duckdb_appender appender,
  bool value
);

duckdb_append_int8

Append an int8_t value to the appender.

Syntax
duckdb_state duckdb_append_int8(
  duckdb_appender appender,
  int8_t value
);

duckdb_append_int16

Append an int16_t value to the appender.

Syntax
duckdb_state duckdb_append_int16(
  duckdb_appender appender,
  int16_t value
);

duckdb_append_int32

Append an int32_t value to the appender.

Syntax
duckdb_state duckdb_append_int32(
  duckdb_appender appender,
  int32_t value
);

duckdb_append_int64

Append an int64_t value to the appender.

Syntax
duckdb_state duckdb_append_int64(
  duckdb_appender appender,
  int64_t value
);

duckdb_append_hugeint

Append a duckdb_hugeint value to the appender.

Syntax
duckdb_state duckdb_append_hugeint(
  duckdb_appender appender,
  duckdb_hugeint value
);

duckdb_append_uint8

Append a uint8_t value to the appender.

Syntax
duckdb_state duckdb_append_uint8(
  duckdb_appender appender,
  uint8_t value
);

duckdb_append_uint16

Append a uint16_t value to the appender.

Syntax
duckdb_state duckdb_append_uint16(
  duckdb_appender appender,
  uint16_t value
);

duckdb_append_uint32

Append a uint32_t value to the appender.

Syntax
duckdb_state duckdb_append_uint32(
  duckdb_appender appender,
  uint32_t value
);

duckdb_append_uint64

Append a uint64_t value to the appender.

Syntax
duckdb_state duckdb_append_uint64(
  duckdb_appender appender,
  uint64_t value
);

duckdb_append_uhugeint

Append a duckdb_uhugeint value to the appender.

Syntax
duckdb_state duckdb_append_uhugeint(
  duckdb_appender appender,
  duckdb_uhugeint value
);

duckdb_append_float

Append a float value to the appender.

Syntax
duckdb_state duckdb_append_float(
  duckdb_appender appender,
  float value
);

duckdb_append_double

Append a double value to the appender.

Syntax
duckdb_state duckdb_append_double(
  duckdb_appender appender,
  double value
);

duckdb_append_date

Append a duckdb_date value to the appender.

Syntax
duckdb_state duckdb_append_date(
  duckdb_appender appender,
  duckdb_date value
);

duckdb_append_time

Append a duckdb_time value to the appender.

Syntax
duckdb_state duckdb_append_time(
  duckdb_appender appender,
  duckdb_time value
);

duckdb_append_timestamp

Append a duckdb_timestamp value to the appender.

Syntax
duckdb_state duckdb_append_timestamp(
  duckdb_appender appender,
  duckdb_timestamp value
);

duckdb_append_interval

Append a duckdb_interval value to the appender.

Syntax
duckdb_state duckdb_append_interval(
  duckdb_appender appender,
  duckdb_interval value
);

duckdb_append_varchar

Append a varchar value to the appender.

Syntax
duckdb_state duckdb_append_varchar(
  duckdb_appender appender,
  const char *val
);

duckdb_append_varchar_length

Append a varchar value to the appender.

Syntax
duckdb_state duckdb_append_varchar_length(
  duckdb_appender appender,
  const char *val,
  idx_t length
);

duckdb_append_blob

Append a blob value to the appender.

Syntax
duckdb_state duckdb_append_blob(
  duckdb_appender appender,
  const void *data,
  idx_t length
);

duckdb_append_null

Append a NULL value to the appender (of any type).

Syntax
duckdb_state duckdb_append_null(
  duckdb_appender appender
);

duckdb_append_value

Append a duckdb_value to the appender.

Syntax
duckdb_state duckdb_append_value(
  duckdb_appender appender,
  duckdb_value value
);

duckdb_append_data_chunk

Appends a pre-filled data chunk to the specified appender. Attempts casting, if the data chunk types do not match the active appender types.

Syntax
duckdb_state duckdb_append_data_chunk(
  duckdb_appender appender,
  duckdb_data_chunk chunk
);
Parameters
  • appender: The appender to append to.
  • chunk: The data chunk to append.
Return Value

DuckDBSuccess on success or DuckDBError on failure.


--- layout: docu title: Replacement Scans redirect_from: - /docs/api/c/replacement_scans - /docs/api/c/replacement_scans/ ---

The replacement scan API can be used to register a callback that is called when a table is read that does not exist in the catalog. For example, when a query such as SELECT * FROM my_table is executed and my_table does not exist, the replacement scan callback will be called with my_table as parameter. The replacement scan can then insert a table function with a specific parameter to replace the read of the table.

API Reference Overview

void duckdb_add_replacement_scan(duckdb_database db, duckdb_replacement_callback_t replacement, void *extra_data, duckdb_delete_callback_t delete_callback);
void duckdb_replacement_scan_set_function_name(duckdb_replacement_scan_info info, const char *function_name);
void duckdb_replacement_scan_add_parameter(duckdb_replacement_scan_info info, duckdb_value parameter);
void duckdb_replacement_scan_set_error(duckdb_replacement_scan_info info, const char *error);

duckdb_add_replacement_scan

Add a replacement scan definition to the specified database.

Syntax
void duckdb_add_replacement_scan(
  duckdb_database db,
  duckdb_replacement_callback_t replacement,
  void *extra_data,
  duckdb_delete_callback_t delete_callback
);
Parameters
  • db: The database object to add the replacement scan to
  • replacement: The replacement scan callback
  • extra_data: Extra data that is passed back into the specified callback
  • delete_callback: The delete callback to call on the extra data, if any

duckdb_replacement_scan_set_function_name

Sets the replacement function name. If this function is called in the replacement callback, the replacement scan is performed. If it is not called, the replacement callback is not performed.

Syntax
void duckdb_replacement_scan_set_function_name(
  duckdb_replacement_scan_info info,
  const char *function_name
);
Parameters
  • info: The info object
  • function_name: The function name to substitute.

duckdb_replacement_scan_add_parameter

Adds a parameter to the replacement scan function.

Syntax
void duckdb_replacement_scan_add_parameter(
  duckdb_replacement_scan_info info,
  duckdb_value parameter
);
Parameters
  • info: The info object
  • parameter: The parameter to add.

duckdb_replacement_scan_set_error

Report that an error has occurred while executing the replacement scan.

Syntax
void duckdb_replacement_scan_set_error(
  duckdb_replacement_scan_info info,
  const char *error
);
Parameters
  • info: The info object
  • error: The error message

--- layout: docu title: Query redirect_from: - /docs/api/c/query - /docs/api/c/query/ ---

The duckdb_query method allows SQL queries to be run in DuckDB from C. This method takes two parameters, a (null-terminated) SQL query string and a duckdb_result result pointer. The result pointer may be NULL if the application is not interested in the result set or if the query produces no result. After the result is consumed, the duckdb_destroy_result method should be used to clean up the result.

Elements can be extracted from the duckdb_result object using a variety of methods. The duckdb_column_count can be used to extract the number of columns. duckdb_column_name and duckdb_column_type can be used to extract the names and types of individual columns.

Example

duckdb_state state;
duckdb_result result;

// create a table
state = duckdb_query(con, "CREATE TABLE integers (i INTEGER, j INTEGER);", NULL);
if (state == DuckDBError) {
    // handle error
}
// insert three rows into the table
state = duckdb_query(con, "INSERT INTO integers VALUES (3, 4), (5, 6), (7, NULL);", NULL);
if (state == DuckDBError) {
    // handle error
}
// query rows again
state = duckdb_query(con, "SELECT * FROM integers", &result);
if (state == DuckDBError) {
    // handle error
}
// handle the result
// ...

// destroy the result after we are done with it
duckdb_destroy_result(&result);

Value Extraction

Values can be extracted using either the duckdb_fetch_chunk function, or using the duckdb_value convenience functions. The duckdb_fetch_chunk function directly hands you data chunks in DuckDB's native array format and can therefore be very fast. The duckdb_value functions perform bounds- and type-checking, and will automatically cast values to the desired type. This makes them more convenient and easier to use, at the expense of being slower.

See the [Types]({% link docs/clients/c/types.md %}) page for more information.

For optimal performance, use duckdb_fetch_chunk to extract data from the query result. The duckdb_value functions perform internal type-checking, bounds-checking and casting which makes them slower.

duckdb_fetch_chunk

Below is an end-to-end example that prints the above result to CSV format using the duckdb_fetch_chunk function. Note that the function is NOT generic: we do need to know exactly what the types of the result columns are.

duckdb_database db;
duckdb_connection con;
duckdb_open(nullptr, &db);
duckdb_connect(db, &con);

duckdb_result res;
duckdb_query(con, "CREATE TABLE integers (i INTEGER, j INTEGER);", NULL);
duckdb_query(con, "INSERT INTO integers VALUES (3, 4), (5, 6), (7, NULL);", NULL);
duckdb_query(con, "SELECT * FROM integers;", &res);

// iterate until result is exhausted
while (true) {
    duckdb_data_chunk result = duckdb_fetch_chunk(res);
    if (!result) {
        // result is exhausted
        break;
    }
    // get the number of rows from the data chunk
    idx_t row_count = duckdb_data_chunk_get_size(result);
    // get the first column
    duckdb_vector col1 = duckdb_data_chunk_get_vector(result, 0);
    int32_t *col1_data = (int32_t *) duckdb_vector_get_data(col1);
    uint64_t *col1_validity = duckdb_vector_get_validity(col1);

    // get the second column
    duckdb_vector col2 = duckdb_data_chunk_get_vector(result, 1);
    int32_t *col2_data = (int32_t *) duckdb_vector_get_data(col2);
    uint64_t *col2_validity = duckdb_vector_get_validity(col2);

    // iterate over the rows
    for (idx_t row = 0; row < row_count; row++) {
        if (duckdb_validity_row_is_valid(col1_validity, row)) {
            printf("%d", col1_data[row]);
        } else {
            printf("NULL");
        }
        printf(",");
        if (duckdb_validity_row_is_valid(col2_validity, row)) {
            printf("%d", col2_data[row]);
        } else {
            printf("NULL");
        }
        printf("\n");
    }
    duckdb_destroy_data_chunk(&result);
}
// clean-up
duckdb_destroy_result(&res);
duckdb_disconnect(&con);
duckdb_close(&db);

This prints the following result:

3,4
5,6
7,NULL

duckdb_value

Deprecated The duckdb_value functions are deprecated and are scheduled for removal in a future release.

Below is an example that prints the above result to CSV format using the duckdb_value_varchar function. Note that the function is generic: we do not need to know about the types of the individual result columns.

// print the above result to CSV format using `duckdb_value_varchar`
idx_t row_count = duckdb_row_count(&result);
idx_t column_count = duckdb_column_count(&result);
for (idx_t row = 0; row < row_count; row++) {
    for (idx_t col = 0; col < column_count; col++) {
        if (col > 0) printf(",");
        auto str_val = duckdb_value_varchar(&result, col, row);
        printf("%s", str_val);
        duckdb_free(str_val);
   }
   printf("\n");
}

API Reference Overview

duckdb_state duckdb_query(duckdb_connection connection, const char *query, duckdb_result *out_result);
void duckdb_destroy_result(duckdb_result *result);
const char *duckdb_column_name(duckdb_result *result, idx_t col);
duckdb_type duckdb_column_type(duckdb_result *result, idx_t col);
duckdb_statement_type duckdb_result_statement_type(duckdb_result result);
duckdb_logical_type duckdb_column_logical_type(duckdb_result *result, idx_t col);
idx_t duckdb_column_count(duckdb_result *result);
idx_t duckdb_row_count(duckdb_result *result);
idx_t duckdb_rows_changed(duckdb_result *result);
void *duckdb_column_data(duckdb_result *result, idx_t col);
bool *duckdb_nullmask_data(duckdb_result *result, idx_t col);
const char *duckdb_result_error(duckdb_result *result);
duckdb_error_type duckdb_result_error_type(duckdb_result *result);

duckdb_query

Executes a SQL query within a connection and stores the full (materialized) result in the out_result pointer. If the query fails to execute, DuckDBError is returned and the error message can be retrieved by calling duckdb_result_error.

Note that after running duckdb_query, duckdb_destroy_result must be called on the result object even if the query fails, otherwise the error stored within the result will not be freed correctly.

Syntax
duckdb_state duckdb_query(
  duckdb_connection connection,
  const char *query,
  duckdb_result *out_result
);
Parameters
  • connection: The connection to perform the query in.
  • query: The SQL query to run.
  • out_result: The query result.
Return Value

DuckDBSuccess on success or DuckDBError on failure.


duckdb_destroy_result

Closes the result and de-allocates all memory allocated for that connection.

Syntax
void duckdb_destroy_result(
  duckdb_result *result
);
Parameters
  • result: The result to destroy.

duckdb_column_name

Returns the column name of the specified column. The result should not need to be freed; the column names will automatically be destroyed when the result is destroyed.

Returns NULL if the column is out of range.

Syntax
const char *duckdb_column_name(
  duckdb_result *result,
  idx_t col
);
Parameters
  • result: The result object to fetch the column name from.
  • col: The column index.
Return Value

The column name of the specified column.


duckdb_column_type

Returns the column type of the specified column.

Returns DUCKDB_TYPE_INVALID if the column is out of range.

Syntax
duckdb_type duckdb_column_type(
  duckdb_result *result,
  idx_t col
);
Parameters
  • result: The result object to fetch the column type from.
  • col: The column index.
Return Value

The column type of the specified column.


duckdb_result_statement_type

Returns the statement type of the statement that was executed

Syntax
duckdb_statement_type duckdb_result_statement_type(
  duckdb_result result
);
Parameters
  • result: The result object to fetch the statement type from.
Return Value

duckdb_statement_type value or DUCKDB_STATEMENT_TYPE_INVALID


duckdb_column_logical_type

Returns the logical column type of the specified column.

The return type of this call should be destroyed with duckdb_destroy_logical_type.

Returns NULL if the column is out of range.

Syntax
duckdb_logical_type duckdb_column_logical_type(
  duckdb_result *result,
  idx_t col
);
Parameters
  • result: The result object to fetch the column type from.
  • col: The column index.
Return Value

The logical column type of the specified column.


duckdb_column_count

Returns the number of columns present in a the result object.

Syntax
idx_t duckdb_column_count(
  duckdb_result *result
);
Parameters
  • result: The result object.
Return Value

The number of columns present in the result object.


duckdb_row_count

Warning Deprecation notice. This method is scheduled for removal in a future release.

Returns the number of rows present in the result object.

Syntax
idx_t duckdb_row_count(
  duckdb_result *result
);
Parameters
  • result: The result object.
Return Value

The number of rows present in the result object.


duckdb_rows_changed

Returns the number of rows changed by the query stored in the result. This is relevant only for INSERT/UPDATE/DELETE queries. For other queries the rows_changed will be 0.

Syntax
idx_t duckdb_rows_changed(
  duckdb_result *result
);
Parameters
  • result: The result object.
Return Value

The number of rows changed.


duckdb_column_data

Deprecated This method has been deprecated. Prefer using duckdb_result_get_chunk instead.

Returns the data of a specific column of a result in columnar format.

The function returns a dense array which contains the result data. The exact type stored in the array depends on the corresponding duckdb_type (as provided by duckdb_column_type). For the exact type by which the data should be accessed, see the comments in the types section or the DUCKDB_TYPE enum.

For example, for a column of type DUCKDB_TYPE_INTEGER, rows can be accessed in the following manner:

int32_t *data = (int32_t *) duckdb_column_data(&result, 0);
printf("Data for row %d: %d\n", row, data[row]);
Syntax
void *duckdb_column_data(
  duckdb_result *result,
  idx_t col
);
Parameters
  • result: The result object to fetch the column data from.
  • col: The column index.
Return Value

The column data of the specified column.


duckdb_nullmask_data

Deprecated This method has been deprecated. Prefer using duckdb_result_get_chunk instead.

Returns the nullmask of a specific column of a result in columnar format. The nullmask indicates for every row whether or not the corresponding row is NULL. If a row is NULL, the values present in the array provided by duckdb_column_data are undefined.

int32_t *data = (int32_t *) duckdb_column_data(&result, 0);
bool *nullmask = duckdb_nullmask_data(&result, 0);
if (nullmask[row]) {
printf("Data for row %d: NULL\n", row);
} else {
printf("Data for row %d: %d\n", row, data[row]);
}
Syntax
bool *duckdb_nullmask_data(
  duckdb_result *result,
  idx_t col
);
Parameters
  • result: The result object to fetch the nullmask from.
  • col: The column index.
Return Value

The nullmask of the specified column.


duckdb_result_error

Returns the error message contained within the result. The error is only set if duckdb_query returns DuckDBError.

The result of this function must not be freed. It will be cleaned up when duckdb_destroy_result is called.

Syntax
const char *duckdb_result_error(
  duckdb_result *result
);
Parameters
  • result: The result object to fetch the error from.
Return Value

The error of the result.


duckdb_result_error_type

Returns the result error type contained within the result. The error is only set if duckdb_query returns DuckDBError.

Syntax
duckdb_error_type duckdb_result_error_type(
  duckdb_result *result
);
Parameters
  • result: The result object to fetch the error from.
Return Value

The error type of the result.


--- layout: docu title: Values redirect_from: - /docs/api/c/value - /docs/api/c/value/ ---

The value class represents a single value of any type.

API Reference Overview

void duckdb_destroy_value(duckdb_value *value);
duckdb_value duckdb_create_varchar(const char *text);
duckdb_value duckdb_create_varchar_length(const char *text, idx_t length);
duckdb_value duckdb_create_bool(bool input);
duckdb_value duckdb_create_int8(int8_t input);
duckdb_value duckdb_create_uint8(uint8_t input);
duckdb_value duckdb_create_int16(int16_t input);
duckdb_value duckdb_create_uint16(uint16_t input);
duckdb_value duckdb_create_int32(int32_t input);
duckdb_value duckdb_create_uint32(uint32_t input);
duckdb_value duckdb_create_uint64(uint64_t input);
duckdb_value duckdb_create_int64(int64_t val);
duckdb_value duckdb_create_hugeint(duckdb_hugeint input);
duckdb_value duckdb_create_uhugeint(duckdb_uhugeint input);
duckdb_value duckdb_create_varint(duckdb_varint input);
duckdb_value duckdb_create_decimal(duckdb_decimal input);
duckdb_value duckdb_create_float(float input);
duckdb_value duckdb_create_double(double input);
duckdb_value duckdb_create_date(duckdb_date input);
duckdb_value duckdb_create_time(duckdb_time input);
duckdb_value duckdb_create_time_tz_value(duckdb_time_tz value);
duckdb_value duckdb_create_timestamp(duckdb_timestamp input);
duckdb_value duckdb_create_timestamp_tz(duckdb_timestamp input);
duckdb_value duckdb_create_timestamp_s(duckdb_timestamp_s input);
duckdb_value duckdb_create_timestamp_ms(duckdb_timestamp_ms input);
duckdb_value duckdb_create_timestamp_ns(duckdb_timestamp_ns input);
duckdb_value duckdb_create_interval(duckdb_interval input);
duckdb_value duckdb_create_blob(const uint8_t *data, idx_t length);
duckdb_value duckdb_create_bit(duckdb_bit input);
duckdb_value duckdb_create_uuid(duckdb_uhugeint input);
bool duckdb_get_bool(duckdb_value val);
int8_t duckdb_get_int8(duckdb_value val);
uint8_t duckdb_get_uint8(duckdb_value val);
int16_t duckdb_get_int16(duckdb_value val);
uint16_t duckdb_get_uint16(duckdb_value val);
int32_t duckdb_get_int32(duckdb_value val);
uint32_t duckdb_get_uint32(duckdb_value val);
int64_t duckdb_get_int64(duckdb_value val);
uint64_t duckdb_get_uint64(duckdb_value val);
duckdb_hugeint duckdb_get_hugeint(duckdb_value val);
duckdb_uhugeint duckdb_get_uhugeint(duckdb_value val);
duckdb_varint duckdb_get_varint(duckdb_value val);
duckdb_decimal duckdb_get_decimal(duckdb_value val);
float duckdb_get_float(duckdb_value val);
double duckdb_get_double(duckdb_value val);
duckdb_date duckdb_get_date(duckdb_value val);
duckdb_time duckdb_get_time(duckdb_value val);
duckdb_time_tz duckdb_get_time_tz(duckdb_value val);
duckdb_timestamp duckdb_get_timestamp(duckdb_value val);
duckdb_timestamp duckdb_get_timestamp_tz(duckdb_value val);
duckdb_timestamp_s duckdb_get_timestamp_s(duckdb_value val);
duckdb_timestamp_ms duckdb_get_timestamp_ms(duckdb_value val);
duckdb_timestamp_ns duckdb_get_timestamp_ns(duckdb_value val);
duckdb_interval duckdb_get_interval(duckdb_value val);
duckdb_logical_type duckdb_get_value_type(duckdb_value val);
duckdb_blob duckdb_get_blob(duckdb_value val);
duckdb_bit duckdb_get_bit(duckdb_value val);
duckdb_uhugeint duckdb_get_uuid(duckdb_value val);
char *duckdb_get_varchar(duckdb_value value);
duckdb_value duckdb_create_struct_value(duckdb_logical_type type, duckdb_value *values);
duckdb_value duckdb_create_list_value(duckdb_logical_type type, duckdb_value *values, idx_t value_count);
duckdb_value duckdb_create_array_value(duckdb_logical_type type, duckdb_value *values, idx_t value_count);
idx_t duckdb_get_map_size(duckdb_value value);
duckdb_value duckdb_get_map_key(duckdb_value value, idx_t index);
duckdb_value duckdb_get_map_value(duckdb_value value, idx_t index);
bool duckdb_is_null_value(duckdb_value value);
duckdb_value duckdb_create_null_value();
idx_t duckdb_get_list_size(duckdb_value value);
duckdb_value duckdb_get_list_child(duckdb_value value, idx_t index);
duckdb_value duckdb_create_enum_value(duckdb_logical_type type, uint64_t value);
uint64_t duckdb_get_enum_value(duckdb_value value);
duckdb_value duckdb_get_struct_child(duckdb_value value, idx_t index);

duckdb_destroy_value

Destroys the value and de-allocates all memory allocated for that type.

Syntax
void duckdb_destroy_value(
  duckdb_value *value
);
Parameters
  • value: The value to destroy.

duckdb_create_varchar

Creates a value from a null-terminated string

Syntax
duckdb_value duckdb_create_varchar(
  const char *text
);
Parameters
  • text: The null-terminated string
Return Value

The value. This must be destroyed with duckdb_destroy_value.


duckdb_create_varchar_length

Creates a value from a string

Syntax
duckdb_value duckdb_create_varchar_length(
  const char *text,
  idx_t length
);
Parameters
  • text: The text
  • length: The length of the text
Return Value

The value. This must be destroyed with duckdb_destroy_value.


duckdb_create_bool

Creates a value from a boolean

Syntax
duckdb_value duckdb_create_bool(
  bool input
);
Parameters
  • input: The boolean value
Return Value

The value. This must be destroyed with duckdb_destroy_value.


duckdb_create_int8

Creates a value from a int8_t (a tinyint)

Syntax
duckdb_value duckdb_create_int8(
  int8_t input
);
Parameters
  • input: The tinyint value
Return Value

The value. This must be destroyed with duckdb_destroy_value.


duckdb_create_uint8

Creates a value from a uint8_t (a utinyint)

Syntax
duckdb_value duckdb_create_uint8(
  uint8_t input
);
Parameters
  • input: The utinyint value
Return Value

The value. This must be destroyed with duckdb_destroy_value.


duckdb_create_int16

Creates a value from a int16_t (a smallint)

Syntax
duckdb_value duckdb_create_int16(
  int16_t input
);
Parameters
  • input: The smallint value
Return Value

The value. This must be destroyed with duckdb_destroy_value.


duckdb_create_uint16

Creates a value from a uint16_t (a usmallint)

Syntax
duckdb_value duckdb_create_uint16(
  uint16_t input
);
Parameters
  • input: The usmallint value
Return Value

The value. This must be destroyed with duckdb_destroy_value.


duckdb_create_int32

Creates a value from a int32_t (an integer)

Syntax
duckdb_value duckdb_create_int32(
  int32_t input
);
Parameters
  • input: The integer value
Return Value

The value. This must be destroyed with duckdb_destroy_value.


duckdb_create_uint32

Creates a value from a uint32_t (a uinteger)

Syntax
duckdb_value duckdb_create_uint32(
  uint32_t input
);
Parameters
  • input: The uinteger value
Return Value

The value. This must be destroyed with duckdb_destroy_value.


duckdb_create_uint64

Creates a value from a uint64_t (a ubigint)

Syntax
duckdb_value duckdb_create_uint64(
  uint64_t input
);
Parameters
  • input: The ubigint value
Return Value

The value. This must be destroyed with duckdb_destroy_value.


duckdb_create_int64

Creates a value from an int64

Return Value

The value. This must be destroyed with duckdb_destroy_value.

Syntax
duckdb_value duckdb_create_int64(
  int64_t val
);

duckdb_create_hugeint

Creates a value from a hugeint

Syntax
duckdb_value duckdb_create_hugeint(
  duckdb_hugeint input
);
Parameters
  • input: The hugeint value
Return Value

The value. This must be destroyed with duckdb_destroy_value.


duckdb_create_uhugeint

Creates a value from a uhugeint

Syntax
duckdb_value duckdb_create_uhugeint(
  duckdb_uhugeint input
);
Parameters
  • input: The uhugeint value
Return Value

The value. This must be destroyed with duckdb_destroy_value.


duckdb_create_varint

Creates a VARINT value from a duckdb_varint

Syntax
duckdb_value duckdb_create_varint(
  duckdb_varint input
);
Parameters
  • input: The duckdb_varint value
Return Value

The value. This must be destroyed with duckdb_destroy_value.


duckdb_create_decimal

Creates a DECIMAL value from a duckdb_decimal

Syntax
duckdb_value duckdb_create_decimal(
  duckdb_decimal input
);
Parameters
  • input: The duckdb_decimal value
Return Value

The value. This must be destroyed with duckdb_destroy_value.


duckdb_create_float

Creates a value from a float

Syntax
duckdb_value duckdb_create_float(
  float input
);
Parameters
  • input: The float value
Return Value

The value. This must be destroyed with duckdb_destroy_value.


duckdb_create_double

Creates a value from a double

Syntax
duckdb_value duckdb_create_double(
  double input
);
Parameters
  • input: The double value
Return Value

The value. This must be destroyed with duckdb_destroy_value.


duckdb_create_date

Creates a value from a date

Syntax
duckdb_value duckdb_create_date(
  duckdb_date input
);
Parameters
  • input: The date value
Return Value

The value. This must be destroyed with duckdb_destroy_value.


duckdb_create_time

Creates a value from a time

Syntax
duckdb_value duckdb_create_time(
  duckdb_time input
);
Parameters
  • input: The time value
Return Value

The value. This must be destroyed with duckdb_destroy_value.


duckdb_create_time_tz_value

Creates a value from a time_tz. Not to be confused with duckdb_create_time_tz, which creates a duckdb_time_tz_t.

Syntax
duckdb_value duckdb_create_time_tz_value(
  duckdb_time_tz value
);
Parameters
  • value: The time_tz value
Return Value

The value. This must be destroyed with duckdb_destroy_value.


duckdb_create_timestamp

Creates a TIMESTAMP value from a duckdb_timestamp

Syntax
duckdb_value duckdb_create_timestamp(
  duckdb_timestamp input
);
Parameters
  • input: The duckdb_timestamp value
Return Value

The value. This must be destroyed with duckdb_destroy_value.


duckdb_create_timestamp_tz

Creates a TIMESTAMP_TZ value from a duckdb_timestamp

Syntax
duckdb_value duckdb_create_timestamp_tz(
  duckdb_timestamp input
);
Parameters
  • input: The duckdb_timestamp value
Return Value

The value. This must be destroyed with duckdb_destroy_value.


duckdb_create_timestamp_s

Creates a TIMESTAMP_S value from a duckdb_timestamp_s

Syntax
duckdb_value duckdb_create_timestamp_s(
  duckdb_timestamp_s input
);
Parameters
  • input: The duckdb_timestamp_s value
Return Value

The value. This must be destroyed with duckdb_destroy_value.


duckdb_create_timestamp_ms

Creates a TIMESTAMP_MS value from a duckdb_timestamp_ms

Syntax
duckdb_value duckdb_create_timestamp_ms(
  duckdb_timestamp_ms input
);
Parameters
  • input: The duckdb_timestamp_ms value
Return Value

The value. This must be destroyed with duckdb_destroy_value.


duckdb_create_timestamp_ns

Creates a TIMESTAMP_NS value from a duckdb_timestamp_ns

Syntax
duckdb_value duckdb_create_timestamp_ns(
  duckdb_timestamp_ns input
);
Parameters
  • input: The duckdb_timestamp_ns value
Return Value

The value. This must be destroyed with duckdb_destroy_value.


duckdb_create_interval

Creates a value from an interval

Syntax
duckdb_value duckdb_create_interval(
  duckdb_interval input
);
Parameters
  • input: The interval value
Return Value

The value. This must be destroyed with duckdb_destroy_value.


duckdb_create_blob

Creates a value from a blob

Syntax
duckdb_value duckdb_create_blob(
  const uint8_t *data,
  idx_t length
);
Parameters
  • data: The blob data
  • length: The length of the blob data
Return Value

The value. This must be destroyed with duckdb_destroy_value.


duckdb_create_bit

Creates a BIT value from a duckdb_bit

Syntax
duckdb_value duckdb_create_bit(
  duckdb_bit input
);
Parameters
  • input: The duckdb_bit value
Return Value

The value. This must be destroyed with duckdb_destroy_value.


duckdb_create_uuid

Creates a UUID value from a uhugeint

Syntax
duckdb_value duckdb_create_uuid(
  duckdb_uhugeint input
);
Parameters
  • input: The duckdb_uhugeint containing the UUID
Return Value

The value. This must be destroyed with duckdb_destroy_value.


duckdb_get_bool

Returns the boolean value of the given value.

Syntax
bool duckdb_get_bool(
  duckdb_value val
);
Parameters
  • val: A duckdb_value containing a boolean
Return Value

A boolean, or false if the value cannot be converted


duckdb_get_int8

Returns the int8_t value of the given value.

Syntax
int8_t duckdb_get_int8(
  duckdb_value val
);
Parameters
  • val: A duckdb_value containing a tinyint
Return Value

A int8_t, or MinValue if the value cannot be converted


duckdb_get_uint8

Returns the uint8_t value of the given value.

Syntax
uint8_t duckdb_get_uint8(
  duckdb_value val
);
Parameters
  • val: A duckdb_value containing a utinyint
Return Value

A uint8_t, or MinValue if the value cannot be converted


duckdb_get_int16

Returns the int16_t value of the given value.

Syntax
int16_t duckdb_get_int16(
  duckdb_value val
);
Parameters
  • val: A duckdb_value containing a smallint
Return Value

A int16_t, or MinValue if the value cannot be converted


duckdb_get_uint16

Returns the uint16_t value of the given value.

Syntax
uint16_t duckdb_get_uint16(
  duckdb_value val
);
Parameters
  • val: A duckdb_value containing a usmallint
Return Value

A uint16_t, or MinValue if the value cannot be converted


duckdb_get_int32

Returns the int32_t value of the given value.

Syntax
int32_t duckdb_get_int32(
  duckdb_value val
);
Parameters
  • val: A duckdb_value containing a integer
Return Value

A int32_t, or MinValue if the value cannot be converted


duckdb_get_uint32

Returns the uint32_t value of the given value.

Syntax
uint32_t duckdb_get_uint32(
  duckdb_value val
);
Parameters
  • val: A duckdb_value containing a uinteger
Return Value

A uint32_t, or MinValue if the value cannot be converted


duckdb_get_int64

Returns the int64_t value of the given value.

Syntax
int64_t duckdb_get_int64(
  duckdb_value val
);
Parameters
  • val: A duckdb_value containing a bigint
Return Value

A int64_t, or MinValue if the value cannot be converted


duckdb_get_uint64

Returns the uint64_t value of the given value.

Syntax
uint64_t duckdb_get_uint64(
  duckdb_value val
);
Parameters
  • val: A duckdb_value containing a ubigint
Return Value

A uint64_t, or MinValue if the value cannot be converted


duckdb_get_hugeint

Returns the hugeint value of the given value.

Syntax
duckdb_hugeint duckdb_get_hugeint(
  duckdb_value val
);
Parameters
  • val: A duckdb_value containing a hugeint
Return Value

A duckdb_hugeint, or MinValue if the value cannot be converted


duckdb_get_uhugeint

Returns the uhugeint value of the given value.

Syntax
duckdb_uhugeint duckdb_get_uhugeint(
  duckdb_value val
);
Parameters
  • val: A duckdb_value containing a uhugeint
Return Value

A duckdb_uhugeint, or MinValue if the value cannot be converted


duckdb_get_varint

Returns the duckdb_varint value of the given value. The data field must be destroyed with duckdb_free.

Syntax
duckdb_varint duckdb_get_varint(
  duckdb_value val
);
Parameters
  • val: A duckdb_value containing a VARINT
Return Value

A duckdb_varint. The data field must be destroyed with duckdb_free.


duckdb_get_decimal

Returns the duckdb_decimal value of the given value.

Syntax
duckdb_decimal duckdb_get_decimal(
  duckdb_value val
);
Parameters
  • val: A duckdb_value containing a DECIMAL
Return Value

A duckdb_decimal, or MinValue if the value cannot be converted


duckdb_get_float

Returns the float value of the given value.

Syntax
float duckdb_get_float(
  duckdb_value val
);
Parameters
  • val: A duckdb_value containing a float
Return Value

A float, or NAN if the value cannot be converted


duckdb_get_double

Returns the double value of the given value.

Syntax
double duckdb_get_double(
  duckdb_value val
);
Parameters
  • val: A duckdb_value containing a double
Return Value

A double, or NAN if the value cannot be converted


duckdb_get_date

Returns the date value of the given value.

Syntax
duckdb_date duckdb_get_date(
  duckdb_value val
);
Parameters
  • val: A duckdb_value containing a date
Return Value

A duckdb_date, or MinValue if the value cannot be converted


duckdb_get_time

Returns the time value of the given value.

Syntax
duckdb_time duckdb_get_time(
  duckdb_value val
);
Parameters
  • val: A duckdb_value containing a time
Return Value

A duckdb_time, or MinValue if the value cannot be converted


duckdb_get_time_tz

Returns the time_tz value of the given value.

Syntax
duckdb_time_tz duckdb_get_time_tz(
  duckdb_value val
);
Parameters
  • val: A duckdb_value containing a time_tz
Return Value

A duckdb_time_tz, or MinValue<time_tz> if the value cannot be converted


duckdb_get_timestamp

Returns the TIMESTAMP value of the given value.

Syntax
duckdb_timestamp duckdb_get_timestamp(
  duckdb_value val
);
Parameters
  • val: A duckdb_value containing a TIMESTAMP
Return Value

A duckdb_timestamp, or MinValue if the value cannot be converted


duckdb_get_timestamp_tz

Returns the TIMESTAMP_TZ value of the given value.

Syntax
duckdb_timestamp duckdb_get_timestamp_tz(
  duckdb_value val
);
Parameters
  • val: A duckdb_value containing a TIMESTAMP_TZ
Return Value

A duckdb_timestamp, or MinValue<timestamp_tz> if the value cannot be converted


duckdb_get_timestamp_s

Returns the duckdb_timestamp_s value of the given value.

Syntax
duckdb_timestamp_s duckdb_get_timestamp_s(
  duckdb_value val
);
Parameters
  • val: A duckdb_value containing a TIMESTAMP_S
Return Value

A duckdb_timestamp_s, or MinValue<timestamp_s> if the value cannot be converted


duckdb_get_timestamp_ms

Returns the duckdb_timestamp_ms value of the given value.

Syntax
duckdb_timestamp_ms duckdb_get_timestamp_ms(
  duckdb_value val
);
Parameters
  • val: A duckdb_value containing a TIMESTAMP_MS
Return Value

A duckdb_timestamp_ms, or MinValue<timestamp_ms> if the value cannot be converted


duckdb_get_timestamp_ns

Returns the duckdb_timestamp_ns value of the given value.

Syntax
duckdb_timestamp_ns duckdb_get_timestamp_ns(
  duckdb_value val
);
Parameters
  • val: A duckdb_value containing a TIMESTAMP_NS
Return Value

A duckdb_timestamp_ns, or MinValue<timestamp_ns> if the value cannot be converted


duckdb_get_interval

Returns the interval value of the given value.

Syntax
duckdb_interval duckdb_get_interval(
  duckdb_value val
);
Parameters
  • val: A duckdb_value containing a interval
Return Value

A duckdb_interval, or MinValue if the value cannot be converted


duckdb_get_value_type

Returns the type of the given value. The type is valid as long as the value is not destroyed. The type itself must not be destroyed.

Syntax
duckdb_logical_type duckdb_get_value_type(
  duckdb_value val
);
Parameters
  • val: A duckdb_value
Return Value

A duckdb_logical_type.


duckdb_get_blob

Returns the blob value of the given value.

Syntax
duckdb_blob duckdb_get_blob(
  duckdb_value val
);
Parameters
  • val: A duckdb_value containing a blob
Return Value

A duckdb_blob


duckdb_get_bit

Returns the duckdb_bit value of the given value. The data field must be destroyed with duckdb_free.

Syntax
duckdb_bit duckdb_get_bit(
  duckdb_value val
);
Parameters
  • val: A duckdb_value containing a BIT
Return Value

A duckdb_bit


duckdb_get_uuid

Returns a duckdb_uhugeint representing the UUID value of the given value.

Syntax
duckdb_uhugeint duckdb_get_uuid(
  duckdb_value val
);
Parameters
  • val: A duckdb_value containing a UUID
Return Value

A duckdb_uhugeint representing the UUID value


duckdb_get_varchar

Obtains a string representation of the given value. The result must be destroyed with duckdb_free.

Syntax
char *duckdb_get_varchar(
  duckdb_value value
);
Parameters
  • value: The value
Return Value

The string value. This must be destroyed with duckdb_free.


duckdb_create_struct_value

Creates a struct value from a type and an array of values. Must be destroyed with duckdb_destroy_value.

Syntax
duckdb_value duckdb_create_struct_value(
  duckdb_logical_type type,
  duckdb_value *values
);
Parameters
  • type: The type of the struct
  • values: The values for the struct fields
Return Value

The struct value, or nullptr, if any child type is DUCKDB_TYPE_ANY or DUCKDB_TYPE_INVALID.


duckdb_create_list_value

Creates a list value from a child (element) type and an array of values of length value_count. Must be destroyed with duckdb_destroy_value.

Syntax
duckdb_value duckdb_create_list_value(
  duckdb_logical_type type,
  duckdb_value *values,
  idx_t value_count
);
Parameters
  • type: The type of the list
  • values: The values for the list
  • value_count: The number of values in the list
Return Value

The list value, or nullptr, if the child type is DUCKDB_TYPE_ANY or DUCKDB_TYPE_INVALID.


duckdb_create_array_value

Creates an array value from a child (element) type and an array of values of length value_count. Must be destroyed with duckdb_destroy_value.

Syntax
duckdb_value duckdb_create_array_value(
  duckdb_logical_type type,
  duckdb_value *values,
  idx_t value_count
);
Parameters
  • type: The type of the array
  • values: The values for the array
  • value_count: The number of values in the array
Return Value

The array value, or nullptr, if the child type is DUCKDB_TYPE_ANY or DUCKDB_TYPE_INVALID.


duckdb_get_map_size

Returns the number of elements in a MAP value.

Syntax
idx_t duckdb_get_map_size(
  duckdb_value value
);
Parameters
  • value: The MAP value.
Return Value

The number of elements in the map.


duckdb_get_map_key

Returns the MAP key at index as a duckdb_value.

Syntax
duckdb_value duckdb_get_map_key(
  duckdb_value value,
  idx_t index
);
Parameters
  • value: The MAP value.
  • index: The index of the key.
Return Value

The key as a duckdb_value.


duckdb_get_map_value

Returns the MAP value at index as a duckdb_value.

Syntax
duckdb_value duckdb_get_map_value(
  duckdb_value value,
  idx_t index
);
Parameters
  • value: The MAP value.
  • index: The index of the value.
Return Value

The value as a duckdb_value.


duckdb_is_null_value

Returns whether the value's type is SQLNULL or not.

Syntax
bool duckdb_is_null_value(
  duckdb_value value
);
Parameters
  • value: The value to check.
Return Value

True, if the value's type is SQLNULL, otherwise false.


duckdb_create_null_value

Creates a value of type SQLNULL.

Return Value

The duckdb_value representing SQLNULL. This must be destroyed with duckdb_destroy_value.

Syntax
duckdb_value duckdb_create_null_value(
  
);

duckdb_get_list_size

Returns the number of elements in a LIST value.

Syntax
idx_t duckdb_get_list_size(
  duckdb_value value
);
Parameters
  • value: The LIST value.
Return Value

The number of elements in the list.


duckdb_get_list_child

Returns the LIST child at index as a duckdb_value.

Syntax
duckdb_value duckdb_get_list_child(
  duckdb_value value,
  idx_t index
);
Parameters
  • value: The LIST value.
  • index: The index of the child.
Return Value

The child as a duckdb_value.


duckdb_create_enum_value

Creates an enum value from a type and a value. Must be destroyed with duckdb_destroy_value.

Syntax
duckdb_value duckdb_create_enum_value(
  duckdb_logical_type type,
  uint64_t value
);
Parameters
  • type: The type of the enum
  • value: The value for the enum
Return Value

The enum value, or nullptr.


duckdb_get_enum_value

Returns the enum value of the given value.

Syntax
uint64_t duckdb_get_enum_value(
  duckdb_value value
);
Parameters
  • value: A duckdb_value containing an enum
Return Value

A uint64_t, or MinValue if the value cannot be converted


duckdb_get_struct_child

Returns the STRUCT child at index as a duckdb_value.

Syntax
duckdb_value duckdb_get_struct_child(
  duckdb_value value,
  idx_t index
);
Parameters
  • value: The STRUCT value.
  • index: The index of the child.
Return Value

The child as a duckdb_value.


--- layout: docu title: Data Chunks redirect_from: - /docs/api/c/data_chunk - /docs/api/c/data_chunk/ ---

Data chunks represent a horizontal slice of a table. They hold a number of [vectors]({% link docs/clients/c/vector.md %}), that can each hold up to the VECTOR_SIZE rows. The vector size can be obtained through the duckdb_vector_size function and is configurable, but is usually set to 2048.

Data chunks and vectors are what DuckDB uses natively to store and represent data. For this reason, the data chunk interface is the most efficient way of interfacing with DuckDB. Be aware, however, that correctly interfacing with DuckDB using the data chunk API does require knowledge of DuckDB's internal vector format.

Data chunks can be used in two manners:

  • Reading Data: Data chunks can be obtained from query results using the duckdb_fetch_chunk method, or as input to a user-defined function. In this case, the [vector methods]({% link docs/clients/c/vector.md %}) can be used to read individual values.
  • Writing Data: Data chunks can be created using duckdb_create_data_chunk. The data chunk can then be filled with values and used in duckdb_append_data_chunk to write data to the database.

The primary manner of interfacing with data chunks is by obtaining the internal vectors of the data chunk using the duckdb_data_chunk_get_vector method. Afterwards, the [vector methods]({% link docs/clients/c/vector.md %}) can be used to read from or write to the individual vectors.

API Reference Overview

duckdb_data_chunk duckdb_create_data_chunk(duckdb_logical_type *types, idx_t column_count);
void duckdb_destroy_data_chunk(duckdb_data_chunk *chunk);
void duckdb_data_chunk_reset(duckdb_data_chunk chunk);
idx_t duckdb_data_chunk_get_column_count(duckdb_data_chunk chunk);
duckdb_vector duckdb_data_chunk_get_vector(duckdb_data_chunk chunk, idx_t col_idx);
idx_t duckdb_data_chunk_get_size(duckdb_data_chunk chunk);
void duckdb_data_chunk_set_size(duckdb_data_chunk chunk, idx_t size);

duckdb_create_data_chunk

Creates an empty data chunk with the specified column types. The result must be destroyed with duckdb_destroy_data_chunk.

Syntax
duckdb_data_chunk duckdb_create_data_chunk(
  duckdb_logical_type *types,
  idx_t column_count
);
Parameters
  • types: An array of column types. Column types can not contain ANY and INVALID types.
  • column_count: The number of columns.
Return Value

The data chunk.


duckdb_destroy_data_chunk

Destroys the data chunk and de-allocates all memory allocated for that chunk.

Syntax
void duckdb_destroy_data_chunk(
  duckdb_data_chunk *chunk
);
Parameters
  • chunk: The data chunk to destroy.

duckdb_data_chunk_reset

Resets a data chunk, clearing the validity masks and setting the cardinality of the data chunk to 0. After calling this method, you must call duckdb_vector_get_validity and duckdb_vector_get_data to obtain current data and validity pointers

Syntax
void duckdb_data_chunk_reset(
  duckdb_data_chunk chunk
);
Parameters
  • chunk: The data chunk to reset.

duckdb_data_chunk_get_column_count

Retrieves the number of columns in a data chunk.

Syntax
idx_t duckdb_data_chunk_get_column_count(
  duckdb_data_chunk chunk
);
Parameters
  • chunk: The data chunk to get the data from
Return Value

The number of columns in the data chunk


duckdb_data_chunk_get_vector

Retrieves the vector at the specified column index in the data chunk.

The pointer to the vector is valid for as long as the chunk is alive. It does NOT need to be destroyed.

Syntax
duckdb_vector duckdb_data_chunk_get_vector(
  duckdb_data_chunk chunk,
  idx_t col_idx
);
Parameters
  • chunk: The data chunk to get the data from
Return Value

The vector


duckdb_data_chunk_get_size

Retrieves the current number of tuples in a data chunk.

Syntax
idx_t duckdb_data_chunk_get_size(
  duckdb_data_chunk chunk
);
Parameters
  • chunk: The data chunk to get the data from
Return Value

The number of tuples in the data chunk


duckdb_data_chunk_set_size

Sets the current number of tuples in a data chunk.

Syntax
void duckdb_data_chunk_set_size(
  duckdb_data_chunk chunk,
  idx_t size
);
Parameters
  • chunk: The data chunk to set the size in
  • size: The number of tuples in the data chunk

--- layout: docu title: Prepared Statements redirect_from: - /docs/api/c/prepared - /docs/api/c/prepared/ ---

A prepared statement is a parameterized query. The query is prepared with question marks (?) or dollar symbols ($1) indicating the parameters of the query. Values can then be bound to these parameters, after which the prepared statement can be executed using those parameters. A single query can be prepared once and executed many times.

Prepared statements are useful to:

  • Easily supply parameters to functions while avoiding string concatenation/SQL injection attacks.
  • Speeding up queries that will be executed many times with different parameters.

DuckDB supports prepared statements in the C API with the duckdb_prepare method. The duckdb_bind family of functions is used to supply values for subsequent execution of the prepared statement using duckdb_execute_prepared. After we are done with the prepared statement it can be cleaned up using the duckdb_destroy_prepare method.

Example

duckdb_prepared_statement stmt;
duckdb_result result;
if (duckdb_prepare(con, "INSERT INTO integers VALUES ($1, $2)", &stmt) == DuckDBError) {
    // handle error
}

duckdb_bind_int32(stmt, 1, 42); // the parameter index starts counting at 1!
duckdb_bind_int32(stmt, 2, 43);
// NULL as second parameter means no result set is requested
duckdb_execute_prepared(stmt, NULL);
duckdb_destroy_prepare(&stmt);

// we can also query result sets using prepared statements
if (duckdb_prepare(con, "SELECT * FROM integers WHERE i = ?", &stmt) == DuckDBError) {
    // handle error
}
duckdb_bind_int32(stmt, 1, 42);
duckdb_execute_prepared(stmt, &result);

// do something with result

// clean up
duckdb_destroy_result(&result);
duckdb_destroy_prepare(&stmt);

After calling duckdb_prepare, the prepared statement parameters can be inspected using duckdb_nparams and duckdb_param_type. In case the prepare fails, the error can be obtained through duckdb_prepare_error.

It is not required that the duckdb_bind family of functions matches the prepared statement parameter type exactly. The values will be auto-cast to the required value as required. For example, calling duckdb_bind_int8 on a parameter type of DUCKDB_TYPE_INTEGER will work as expected.

Warning Do not use prepared statements to insert large amounts of data into DuckDB. Instead it is recommended to use the [Appender]({% link docs/clients/c/appender.md %}).

API Reference Overview

duckdb_state duckdb_prepare(duckdb_connection connection, const char *query, duckdb_prepared_statement *out_prepared_statement);
void duckdb_destroy_prepare(duckdb_prepared_statement *prepared_statement);
const char *duckdb_prepare_error(duckdb_prepared_statement prepared_statement);
idx_t duckdb_nparams(duckdb_prepared_statement prepared_statement);
const char *duckdb_parameter_name(duckdb_prepared_statement prepared_statement, idx_t index);
duckdb_type duckdb_param_type(duckdb_prepared_statement prepared_statement, idx_t param_idx);
duckdb_logical_type duckdb_param_logical_type(duckdb_prepared_statement prepared_statement, idx_t param_idx);
duckdb_state duckdb_clear_bindings(duckdb_prepared_statement prepared_statement);
duckdb_statement_type duckdb_prepared_statement_type(duckdb_prepared_statement statement);

duckdb_prepare

Create a prepared statement object from a query.

Note that after calling duckdb_prepare, the prepared statement should always be destroyed using duckdb_destroy_prepare, even if the prepare fails.

If the prepare fails, duckdb_prepare_error can be called to obtain the reason why the prepare failed.

Syntax
duckdb_state duckdb_prepare(
  duckdb_connection connection,
  const char *query,
  duckdb_prepared_statement *out_prepared_statement
);
Parameters
  • connection: The connection object
  • query: The SQL query to prepare
  • out_prepared_statement: The resulting prepared statement object
Return Value

DuckDBSuccess on success or DuckDBError on failure.


duckdb_destroy_prepare

Closes the prepared statement and de-allocates all memory allocated for the statement.

Syntax
void duckdb_destroy_prepare(
  duckdb_prepared_statement *prepared_statement
);
Parameters
  • prepared_statement: The prepared statement to destroy.

duckdb_prepare_error

Returns the error message associated with the given prepared statement. If the prepared statement has no error message, this returns nullptr instead.

The error message should not be freed. It will be de-allocated when duckdb_destroy_prepare is called.

Syntax
const char *duckdb_prepare_error(
  duckdb_prepared_statement prepared_statement
);
Parameters
  • prepared_statement: The prepared statement to obtain the error from.
Return Value

The error message, or nullptr if there is none.


duckdb_nparams

Returns the number of parameters that can be provided to the given prepared statement.

Returns 0 if the query was not successfully prepared.

Syntax
idx_t duckdb_nparams(
  duckdb_prepared_statement prepared_statement
);
Parameters
  • prepared_statement: The prepared statement to obtain the number of parameters for.

duckdb_parameter_name

Returns the name used to identify the parameter The returned string should be freed using duckdb_free.

Returns NULL if the index is out of range for the provided prepared statement.

Syntax
const char *duckdb_parameter_name(
  duckdb_prepared_statement prepared_statement,
  idx_t index
);
Parameters
  • prepared_statement: The prepared statement for which to get the parameter name from.

duckdb_param_type

Returns the parameter type for the parameter at the given index.

Returns DUCKDB_TYPE_INVALID if the parameter index is out of range or the statement was not successfully prepared.

Syntax
duckdb_type duckdb_param_type(
  duckdb_prepared_statement prepared_statement,
  idx_t param_idx
);
Parameters
  • prepared_statement: The prepared statement.
  • param_idx: The parameter index.
Return Value

The parameter type


duckdb_param_logical_type

Returns the logical type for the parameter at the given index.

Returns nullptr if the parameter index is out of range or the statement was not successfully prepared.

The return type of this call should be destroyed with duckdb_destroy_logical_type.

Syntax
duckdb_logical_type duckdb_param_logical_type(
  duckdb_prepared_statement prepared_statement,
  idx_t param_idx
);
Parameters
  • prepared_statement: The prepared statement.
  • param_idx: The parameter index.
Return Value

The logical type of the parameter


duckdb_clear_bindings

Clear the params bind to the prepared statement.

Syntax
duckdb_state duckdb_clear_bindings(
  duckdb_prepared_statement prepared_statement
);

duckdb_prepared_statement_type

Returns the statement type of the statement to be executed

Syntax
duckdb_statement_type duckdb_prepared_statement_type(
  duckdb_prepared_statement statement
);
Parameters
  • statement: The prepared statement.
Return Value

duckdb_statement_type value or DUCKDB_STATEMENT_TYPE_INVALID


--- layout: docu title: Search ---
<script src="{{ site.baseurl }}/js/minisearch.js"></script> <script src="{{ site.baseurl }}/js/search.js"></script>

layout: docu title: Embedding DuckDB

CLI Client

The [Command Line Interface (CLI) client]({% link docs/clients/cli/overview.md %}) is intended for interactive use cases and not for embedding. As a result, it has more features that could be abused by a malicious actor. For example, the CLI client has the .sh feature that allows executing arbitrary shell commands. This feature is only present in the CLI client and not in any other DuckDB clients.

.sh ls

Tip Calling DuckDB's CLI client via shell commands is not recommended for embedding DuckDB. It is recommended to use one of the client libraries, e.g., [Python]({% link docs/clients/python/overview.md %}), [R]({% link docs/clients/r.md %}), [Java]({% link docs/clients/java.md %}), etc.


layout: docu title: Secrets Manager

The Secrets manager provides a unified user interface for secrets across all backends that use them. Secrets can be scoped, so different storage prefixes can have different secrets, allowing for example to join data across organizations in a single query. Secrets can also be persisted, so that they do not need to be specified every time DuckDB is launched.

Warning Persistent secrets are stored in unencrypted binary format on the disk.

Types of Secrets

Secrets are typed, their type identifies which service they are for. Most secrets are not included in DuckDB default, instead, they are registered by extensions. Currently, the following secret types are available:

Secret type Service / protocol Extension
AZURE Azure Blob Storage [azure]({% link docs/extensions/azure.md %})
GCS Google Cloud Storage [httpfs]({% link docs/extensions/httpfs/s3api.md %})
HTTP HTTP and HTTPS [httpfs]({% link docs/extensions/httpfs/https.md %})
HUGGINGFACE Hugging Face [httpfs]({% link docs/extensions/httpfs/hugging_face.md %})
MYSQL MySQL [mysql]({% link docs/extensions/mysql.md %})
POSTGRES PostgreSQL [postgres]({% link docs/extensions/postgres.md %})
R2 Cloudflare R2 [httpfs]({% link docs/extensions/httpfs/s3api.md %})
S3 AWS S3 [httpfs]({% link docs/extensions/httpfs/s3api.md %})

For each type, there are one or more “secret providers” that specify how the secret is created. Secrets can also have an optional scope, which is a file path prefix that the secret applies to. When fetching a secret for a path, the secret scopes are compared to the path, returning the matching secret for the path. In the case of multiple matching secrets, the longest prefix is chosen.

Creating a Secret

Secrets can be created using the [CREATE SECRET SQL statement]({% link docs/sql/statements/create_secret.md %}). Secrets can be temporary or persistent. Temporary secrets are used by default – and are stored in-memory for the life span of the DuckDB instance similar to how settings worked previously. Persistent secrets are stored in unencrypted binary format in the ~/.duckdb/stored_secrets directory. On startup of DuckDB, persistent secrets are read from this directory and automatically loaded.

Secret Providers

To create a secret, a Secret Provider needs to be used. A Secret Provider is a mechanism through which a secret is generated. To illustrate this, for the S3, GCS, R2, and AZURE secret types, DuckDB currently supports two providers: CONFIG and CREDENTIAL_CHAIN. The CONFIG provider requires the user to pass all configuration information into the CREATE SECRET, whereas the CREDENTIAL_CHAIN provider will automatically try to fetch credentials. When no Secret Provider is specified, the CONFIG provider is used. For more details on how to create secrets using different providers check out the respective pages on [httpfs]({% link docs/extensions/httpfs/overview.md %}#configuration-and-authentication-using-secrets) and [azure]({% link docs/extensions/azure.md %}#authentication-with-secret).

Temporary Secrets

To create a temporary unscoped secret to access S3, we can now use the following:

CREATE SECRET my_secret (
    TYPE S3,
    KEY_ID 'my_secret_key',
    SECRET 'my_secret_value',
    REGION 'my_region'
);

Note that we implicitly use the default CONFIG secret provider here.

Persistent Secrets

In order to persist secrets between DuckDB database instances, we can now use the CREATE PERSISTENT SECRET command, e.g.:

CREATE PERSISTENT SECRET my_persistent_secret (
    TYPE S3,
    KEY_ID 'my_secret_key',
    SECRET 'my_secret_value'
);

By default, this will write the secret (unencrypted) to the ~/.duckdb/stored_secrets directory. To change the secrets directory, issue:

SET secret_directory = 'path/to/my_secrets_dir';

Note that setting the value of the home_directory configuration option has no effect on the location of the secrets.

Deleting Secrets

Secrets can be deleted using the [DROP SECRET statement]({% link docs/sql/statements/create_secret.md %}#syntax-for-drop-secret), e.g.:

DROP PERSISTENT SECRET my_persistent_secret;

Creating Multiple Secrets for the Same Service Type

If two secrets exist for a service type, the scope can be used to decide which one should be used. For example:

CREATE SECRET secret1 (
    TYPE S3,
    KEY_ID 'my_secret_key1',
    SECRET 'my_secret_value1',
    SCOPE 's3://my-bucket'
);
CREATE SECRET secret2 (
    TYPE S3,
    KEY_ID 'my_secret_key2',
    SECRET 'my_secret_value2',
    SCOPE 's3://my-other-bucket'
);

Now, if the user queries something from s3://my-other-bucket/something, secret secret2 will be chosen automatically for that request. To see which secret is being used, the which_secret scalar function can be used, which takes a path and a secret type as parameters:

FROM which_secret('s3://my-other-bucket/file.parquet', 's3');

Listing Secrets

Secrets can be listed using the built-in table-producing function, e.g., by using the [duckdb_secrets() table function]({% link docs/sql/meta/duckdb_table_functions.md %}#duckdb_secrets):

FROM duckdb_secrets();

Sensitive information will be redacted.

layout: docu title: Pragmas redirect_from:

  • /docs/sql/pragmas
  • /docs/sql/pragmas/

The PRAGMA statement is a SQL extension adopted by DuckDB from SQLite. PRAGMA statements can be issued in a similar manner to regular SQL statements. PRAGMA commands may alter the internal state of the database engine, and can influence the subsequent execution or behavior of the engine.

PRAGMA statements that assign a value to an option can also be issued using the [SET statement]({% link docs/sql/statements/set.md %}) and the value of an option can be retrieved using SELECT current_setting(option_name).

For DuckDB's built in configuration options, see the [Configuration Reference]({% link docs/configuration/overview.md %}#configuration-reference). DuckDB [extensions]({% link docs/extensions/overview.md %}) may register additional configuration options. These are documented in the respective extensions' documentation pages.

This page contains the supported PRAGMA settings.

Metadata

Schema Information

List all databases:

PRAGMA database_list;

List all tables:

PRAGMA show_tables;

List all tables, with extra information, similarly to [DESCRIBE]({% link docs/guides/meta/describe.md %}):

PRAGMA show_tables_expanded;

To list all functions:

PRAGMA functions;

Table Information

Get info for a specific table:

PRAGMA table_info('table_name');
CALL pragma_table_info('table_name');

table_info returns information about the columns of the table with name table_name. The exact format of the table returned is given below:

cid INTEGER,        -- cid of the column
name VARCHAR,       -- name of the column
type VARCHAR,       -- type of the column
notnull BOOLEAN,    -- if the column is marked as NOT NULL
dflt_value VARCHAR, -- default value of the column, or NULL if not specified
pk BOOLEAN          -- part of the primary key or not

Database Size

Get the file and memory size of each database:

PRAGMA database_size;
CALL pragma_database_size();

database_size returns information about the file and memory size of each database. The column types of the returned results are given below:

database_name VARCHAR, -- database name
database_size VARCHAR, -- total block count times the block size
block_size BIGINT,     -- database block size
total_blocks BIGINT,   -- total blocks in the database
used_blocks BIGINT,    -- used blocks in the database
free_blocks BIGINT,    -- free blocks in the database
wal_size VARCHAR,      -- write ahead log size
memory_usage VARCHAR,  -- memory used by the database buffer manager
memory_limit VARCHAR   -- maximum memory allowed for the database

Storage Information

To get storage information:

PRAGMA storage_info('table_name');
CALL pragma_storage_info('table_name');

This call returns the following information for the given table:

Name Type Description
row_group_id BIGINT
column_name VARCHAR
column_id BIGINT
column_path VARCHAR
segment_id BIGINT
segment_type VARCHAR
start BIGINT The start row id of this chunk
count BIGINT The amount of entries in this storage chunk
compression VARCHAR Compression type used for this column – see the [“Lightweight Compression in DuckDB” blog post]({% post_url 2022-10-28-lightweight-compression %})
stats VARCHAR
has_updates BOOLEAN
persistent BOOLEAN false if temporary table
block_id BIGINT Empty unless persistent
block_offset BIGINT Empty unless persistent

See [Storage]({% link docs/internals/storage.md %}) for more information.

Show Databases

The following statement is equivalent to the [SHOW DATABASES statement]({% link docs/sql/statements/attach.md %}):

PRAGMA show_databases;

Resource Management

Memory Limit

Set the memory limit for the buffer manager:

SET memory_limit = '1GB';

Warning The specified memory limit is only applied to the buffer manager. For most queries, the buffer manager handles the majority of the data processed. However, certain in-memory data structures such as [vectors]({% link docs/internals/vector.md %}) and query results are allocated outside of the buffer manager. Additionally, [aggregate functions]({% link docs/sql/functions/aggregates.md %}) with complex state (e.g., list, mode, quantile, string_agg, and approx functions) use memory outside of the buffer manager. Therefore, the actual memory consumption can be higher than the specified memory limit.

Threads

Set the amount of threads for parallel query execution:

SET threads = 4;

Collations

List all available collations:

PRAGMA collations;

Set the default collation to one of the available ones:

SET default_collation = 'nocase';

Default Ordering for NULLs

Set the default ordering for NULLs to be either NULLS_FIRST, NULLS_LAST, NULLS_FIRST_ON_ASC_LAST_ON_DESC or NULLS_LAST_ON_ASC_FIRST_ON_DESC:

SET default_null_order = 'NULLS_FIRST';
SET default_null_order = 'NULLS_LAST_ON_ASC_FIRST_ON_DESC';

Set the default result set ordering direction to ASCENDING or DESCENDING:

SET default_order = 'ASCENDING';
SET default_order = 'DESCENDING';

Ordering by Non-Integer Literals

By default, ordering by non-integer literals is not allowed:

SELECT 42 ORDER BY 'hello world';
-- Binder Error: ORDER BY non-integer literal has no effect.

To allow this behavior, use the order_by_non_integer_literal option:

SET order_by_non_integer_literal = true;

Implicit Casting to VARCHAR

Prior to version 0.10.0, DuckDB would automatically allow any type to be implicitly cast to VARCHAR during function binding. As a result it was possible to e.g., compute the substring of an integer without using an explicit cast. For version v0.10.0 and later an explicit cast is needed instead. To revert to the old behavior that performs implicit casting, set the old_implicit_casting variable to true:

SET old_implicit_casting = true;

Python: Scan All Dataframes

Prior to version 1.1.0, DuckDB's [replacement scan mechanism]({% link docs/clients/c/replacement_scans.md %}) in Python scanned the global Python namespace. To revert to this old behavior, use the following setting:

SET python_scan_all_frames = true;

Information on DuckDB

Version

Show DuckDB version:

PRAGMA version;
CALL pragma_version();

Platform

platform returns an identifier for the platform the current DuckDB executable has been compiled for, e.g., osx_arm64. The format of this identifier matches the platform name as described in the [extension loading explainer]({% link docs/extensions/working_with_extensions.md %}#platforms):

PRAGMA platform;
CALL pragma_platform();

User Agent

The following statement returns the user agent information, e.g., duckdb/v0.10.0(osx_arm64):

PRAGMA user_agent;

Metadata Information

The following statement returns information on the metadata store (block_id, total_blocks, free_blocks, and free_list):

PRAGMA metadata_info;

Progress Bar

Show progress bar when running queries:

PRAGMA enable_progress_bar;

Or:

PRAGMA enable_print_progress_bar;

Don't show a progress bar for running queries:

PRAGMA disable_progress_bar;

Or:

PRAGMA disable_print_progress_bar;

EXPLAIN Output

The output of [EXPLAIN]({% link docs/sql/statements/profiling.md %}) can be configured to show only the physical plan.

The default configuration of EXPLAIN:

SET explain_output = 'physical_only';

To only show the optimized query plan:

SET explain_output = 'optimized_only';

To show all query plans:

SET explain_output = 'all';

Profiling

Enable Profiling

The following query enables profiling with the default format, query_tree. Independent of the format, enable_profiling is mandatory to enable profiling.

PRAGMA enable_profiling;
PRAGMA enable_profile;
Profiling Format

The format of enable_profiling can be specified as query_tree, json, query_tree_optimizer, or no_output. Each format prints its output to the configured output, except no_output.

The default format is query_tree. It prints the physical query plan and the metrics of each operator in the tree.

SET enable_profiling = 'query_tree';

Alternatively, json returns the physical query plan as JSON:

SET enable_profiling = 'json';

To return the physical query plan, including optimizer and planner metrics:

SET enable_profiling = 'query_tree_optimizer';

Database drivers and other applications can also access profiling information through API calls, in which case users can disable any other output. Even though the parameter reads no_output, it is essential to note that this only affects printing to the configurable output. When accessing profiling information through API calls, it is still crucial to enable profiling:

SET enable_profiling = 'no_output';

Profiling Output

By default, DuckDB prints profiling information to the standard output. However, if you prefer to write the profiling information to a file, you can use PRAGMA profiling_output to specify a filepath.

Warning The file contents will be overwritten for every newly issued query. Hence, the file will only contain the profiling information of the last run query:

SET profiling_output = '/path/to/file.json';
SET profile_output = '/path/to/file.json';

Profiling Mode

By default, a limited amount of profiling information is provided (standard).

SET profiling_mode = 'standard';

For more details, use the detailed profiling mode by setting profiling_mode to detailed. The output of this mode includes profiling of the planner and optimizer stages.

SET profiling_mode = 'detailed';

Custom Metrics

By default, profiling enables all metrics except those activated by detailed profiling.

Using the custom_profiling_settings PRAGMA, each metric, including those from detailed profiling, can be individually enabled or disabled. This PRAGMA accepts a JSON object with metric names as keys and Boolean values to toggle them on or off. Settings specified by this PRAGMA override the default behavior.

Note This only affects the metrics when the enable_profiling is set to json or no_output. The query_tree and query_tree_optimizer always use a default set of metrics.

In the following example, the CPU_TIME metric is disabled. The EXTRA_INFO, OPERATOR_CARDINALITY, and OPERATOR_TIMING metrics are enabled.

SET custom_profiling_settings = '{"CPU_TIME": "false", "EXTRA_INFO": "true", "OPERATOR_CARDINALITY": "true", "OPERATOR_TIMING": "true"}';

The profiling documentation contains an overview of the available [metrics]({% link docs/dev/profiling.md %}#metrics).

Disable Profiling

To disable profiling:

PRAGMA disable_profiling;
PRAGMA disable_profile;

Query Optimization

Optimizer

To disable the query optimizer:

PRAGMA disable_optimizer;

To enable the query optimizer:

PRAGMA enable_optimizer;

Selectively Disabling Optimizers

The disabled_optimizers option allows selectively disabling optimization steps. For example, to disable filter_pushdown and statistics_propagation, run:

SET disabled_optimizers = 'filter_pushdown,statistics_propagation';

The available optimizations can be queried using the [duckdb_optimizers() table function]({% link docs/sql/meta/duckdb_table_functions.md %}#duckdb_optimizers).

To re-enable the optimizers, run:

SET disabled_optimizers = '';

Warning The disabled_optimizers option should only be used for debugging performance issues and should be avoided in production.

Logging

Set a path for query logging:

SET log_query_path = '/tmp/duckdb_log/';

Disable query logging:

SET log_query_path = '';

Full-Text Search Indexes

The create_fts_index and drop_fts_index options are only available when the [fts extension]({% link docs/extensions/full_text_search.md %}) is loaded. Their usage is documented on the [Full-Text Search extension page]({% link docs/extensions/full_text_search.md %}).

Verification

Verification of External Operators

Enable verification of external operators:

PRAGMA verify_external;

Disable verification of external operators:

PRAGMA disable_verify_external;

Verification of Round-Trip Capabilities

Enable verification of round-trip capabilities for supported logical plans:

PRAGMA verify_serializer;

Disable verification of round-trip capabilities:

PRAGMA disable_verify_serializer;

Object Cache

Enable caching of objects for e.g., Parquet metadata:

PRAGMA enable_object_cache;

Disable caching of objects:

PRAGMA disable_object_cache;

Checkpointing

Compression

During checkpointing, the existing column data + any new changes get compressed. There exist a couple pragmas to influence which compression functions are considered.

Force Compression

Prefer using this compression method over any other method if possible:

PRAGMA force_compression = 'bitpacking';
Disabled Compression Methods

Avoid using any of the listed compression methods from the comma separated list:

PRAGMA disabled_compression_methods = 'fsst,rle';

Force Checkpoint

When [CHECKPOINT]({% link docs/sql/statements/checkpoint.md %}) is called when no changes are made, force a checkpoint regardless:

PRAGMA force_checkpoint;

Checkpoint on Shutdown

Run a CHECKPOINT on successful shutdown and delete the WAL, to leave only a single database file behind:

PRAGMA enable_checkpoint_on_shutdown;

Don't run a CHECKPOINT on shutdown:

PRAGMA disable_checkpoint_on_shutdown;

Temp Directory for Spilling Data to Disk

By default, DuckDB uses a temporary directory named ⟨database_file_name⟩.tmp to spill to disk, located in the same directory as the database file. To change this, use:

SET temp_directory = '/path/to/temp_dir.tmp/';

Returning Errors as JSON

The errors_as_json option can be set to obtain error information in raw JSON format. For certain errors, extra information or decomposed information is provided for easier machine processing. For example:

SET errors_as_json = true;

Then, running a query that results in an error produces a JSON output:

SELECT * FROM nonexistent_tbl;
{
   "exception_type":"Catalog",
   "exception_message":"Table with name nonexistent_tbl does not exist!\nDid you mean \"temp.information_schema.tables\"?",
   "name":"nonexistent_tbl",
   "candidates":"temp.information_schema.tables",
   "position":"14",
   "type":"Table",
   "error_subtype":"MISSING_ENTRY"
}

IEEE Floating-Point Operation Semantics

DuckDB follows IEEE floating-point operation semantics. If you would like to turn this off, run:

SET ieee_floating_point_ops = false;

In this case, floating point division by zero (e.g., 1.0 / 0.0, 0.0 / 0.0 and -1.0 / 0.0) will all return NULL.

Query Verification (for Development)

The following PRAGMAs are mostly used for development and internal testing.

Enable query verification:

PRAGMA enable_verification;

Disable query verification:

PRAGMA disable_verification;

Enable force parallel query processing:

PRAGMA verify_parallelism;

Disable force parallel query processing:

PRAGMA disable_verify_parallelism;

Block Sizes

When persisting a database to disk, DuckDB writes to a dedicated file containing a list of blocks holding the data. In the case of a file that only holds very little data, e.g., a small table, the default block size of 256KB might not be ideal. Therefore, DuckDB's storage format supports different block sizes.

There are a few constraints on possible block size values.

  • Must be a power of two.
  • Must be greater or equal to 16384 (16 KB).
  • Must be lesser or equal to 262144 (256 KB).

You can set the default block size for all new DuckDB files created by an instance like so:

SET default_block_size = '16384';

It is also possible to set the block size on a per-file basis, see [ATTACH]({% link docs/sql/statements/attach.md %}) for details.

layout: docu title: Overview

We designed DuckDB to be easy to deploy and operate. We believe that most users do not need to consult the pages of the operations manual. However, there are certain setups – e.g., when DuckDB is running in mission-critical infrastructure – where we would like to offer advice on how to configure DuckDB. The operations manual contains advice for these cases and also offers convenient configuration snippets such as Gitignore files.

For advice on getting the best performance from DuckDB, see also the [Performance Guide]({% link docs/guides/performance/overview.md %}).

layout: docu title: Non-Deterministic Behavior

Several operators in DuckDB exhibit non-deterministic behavior. Most notably, SQL uses set semantics, which allows results to be returned in a different order. DuckDB exploits this to improve performance, particularly when performing multi-threaded query execution. Other factors, such as using different compilers, operating systems, and hardware architectures, can also cause changes in ordering. This page documents the cases where non-determinism is an expected behavior. If you would like to make your queries determinisic, see the “Working Around Non-Determinism” section.

Set Semantics

One of the most common sources of non-determinism is the set semantics used by SQL. E.g., if you run the following query repeatedly, you may get two different results:

SELECT *
FROM (
    SELECT 'A' AS x
    UNION
    SELECT 'B' AS x
);

Both results A, B and B, A are correct.

Different Results on Different Platforms: array_distinct

The array_distinct function may return results in a different order on different platforms:

SELECT array_distinct(['A', 'A', 'B', NULL, NULL]) AS arr;

For this query, both [A, B] and [B, A] are valid results.

Floating-Point Aggregate Operations with Multi-Threading

Floating-point inaccuracies may produce different results when run in a multi-threaded configurations: For example, stddev and corr may produce non-deterministic results:

CREATE TABLE tbl AS
    SELECT 'ABCDEFG'[floor(random() * 7 + 1)::INT] AS s, 3.7 AS x, i AS y
    FROM range(1, 1_000_000) r(i);

SELECT s, stddev(x) AS standard_deviation, corr(x, y) AS correlation
FROM tbl
GROUP BY s
ORDER BY s;

The expected standard deviations and correlations from this query are 0 for all values of s. However, when executed on multiple threads, the query may return small numbers (0 <= z < 10e-16) due to floating-point inaccuracies.

Working Around Non-Determinism

For the majority of use cases, non-determinism is not causing any issues. However, there are some cases where deterministic results are desirable. In these cases, try the following workarounds:

  1. Limit the number of threads to prevent non-determinism introduced by multi-threading.

    SET threads = 1;
  2. Enforce ordering. For example, you can use the [ORDER BY ALL clause]({% link docs/sql/query_syntax/orderby.md %}#order-by-all):

    SELECT *
    FROM (
        SELECT 'A' AS x
        UNION
        SELECT 'B' AS x
    )
    ORDER BY ALL;

    You can also sort lists using [list_sort]({% link docs/sql/functions/list.md %}#list_sortlist):

    SELECT list_sort(array_distinct(['A', 'A', 'B', NULL, NULL])) AS i
    ORDER BY i;

    It's also possible to introduce a [deterministic shuffling]({% post_url 2024-08-19-duckdb-tricks-part-1 %}#shuffling-data).


layout: docu title: Types redirect_from:

  • /docs/api/c/types
  • /docs/api/c/types/

DuckDB is a strongly typed database system. As such, every column has a single type specified. This type is constant over the entire column. That is to say, a column that is labeled as an INTEGER column will only contain INTEGER values.

DuckDB also supports columns of composite types. For example, it is possible to define an array of integers (INTEGER[]). It is also possible to define types as arbitrary structs (ROW(i INTEGER, j VARCHAR)). For that reason, native DuckDB type objects are not mere enums, but a class that can potentially be nested.

Types in the C API are modeled using an enum (duckdb_type) and a complex class (duckdb_logical_type). For most primitive types, e.g., integers or varchars, the enum is sufficient. For more complex types, such as lists, structs or decimals, the logical type must be used.

typedef enum DUCKDB_TYPE {
  DUCKDB_TYPE_INVALID = 0,
  DUCKDB_TYPE_BOOLEAN = 1,
  DUCKDB_TYPE_TINYINT = 2,
  DUCKDB_TYPE_SMALLINT = 3,
  DUCKDB_TYPE_INTEGER = 4,
  DUCKDB_TYPE_BIGINT = 5,
  DUCKDB_TYPE_UTINYINT = 6,
  DUCKDB_TYPE_USMALLINT = 7,
  DUCKDB_TYPE_UINTEGER = 8,
  DUCKDB_TYPE_UBIGINT = 9,
  DUCKDB_TYPE_FLOAT = 10,
  DUCKDB_TYPE_DOUBLE = 11,
  DUCKDB_TYPE_TIMESTAMP = 12,
  DUCKDB_TYPE_DATE = 13,
  DUCKDB_TYPE_TIME = 14,
  DUCKDB_TYPE_INTERVAL = 15,
  DUCKDB_TYPE_HUGEINT = 16,
  DUCKDB_TYPE_UHUGEINT = 32,
  DUCKDB_TYPE_VARCHAR = 17,
  DUCKDB_TYPE_BLOB = 18,
  DUCKDB_TYPE_DECIMAL = 19,
  DUCKDB_TYPE_TIMESTAMP_S = 20,
  DUCKDB_TYPE_TIMESTAMP_MS = 21,
  DUCKDB_TYPE_TIMESTAMP_NS = 22,
  DUCKDB_TYPE_ENUM = 23,
  DUCKDB_TYPE_LIST = 24,
  DUCKDB_TYPE_STRUCT = 25,
  DUCKDB_TYPE_MAP = 26,
  DUCKDB_TYPE_ARRAY = 33,
  DUCKDB_TYPE_UUID = 27,
  DUCKDB_TYPE_UNION = 28,
  DUCKDB_TYPE_BIT = 29,
  DUCKDB_TYPE_TIME_TZ = 30,
  DUCKDB_TYPE_TIMESTAMP_TZ = 31,
} duckdb_type;

Functions

The enum type of a column in the result can be obtained using the duckdb_column_type function. The logical type of a column can be obtained using the duckdb_column_logical_type function.

duckdb_value

The duckdb_value functions will auto-cast values as required. For example, it is no problem to use duckdb_value_double on a column of type duckdb_value_int32. The value will be auto-cast and returned as a double. Note that in certain cases the cast may fail. For example, this can happen if we request a duckdb_value_int8 and the value does not fit within an int8 value. In this case, a default value will be returned (usually 0 or nullptr). The same default value will also be returned if the corresponding value is NULL.

The duckdb_value_is_null function can be used to check if a specific value is NULL or not.

The exception to the auto-cast rule is the duckdb_value_varchar_internal function. This function does not auto-cast and only works for VARCHAR columns. The reason this function exists is that the result does not need to be freed.

duckdb_value_varchar and duckdb_value_blob require the result to be de-allocated using duckdb_free.

duckdb_fetch_chunk

The duckdb_fetch_chunk function can be used to read data chunks from a DuckDB result set, and is the most efficient way of reading data from a DuckDB result using the C API. It is also the only way of reading data of certain types from a DuckDB result. For example, the duckdb_value functions do not support structural reading of composite types (lists or structs) or more complex types like enums and decimals.

For more information about data chunks, see the [documentation on data chunks]({% link docs/clients/c/data_chunk.md %}).

API Reference Overview

duckdb_data_chunk duckdb_result_get_chunk(duckdb_result result, idx_t chunk_index);
bool duckdb_result_is_streaming(duckdb_result result);
idx_t duckdb_result_chunk_count(duckdb_result result);
duckdb_result_type duckdb_result_return_type(duckdb_result result);

Date Time Timestamp Helpers

duckdb_date_struct duckdb_from_date(duckdb_date date);
duckdb_date duckdb_to_date(duckdb_date_struct date);
bool duckdb_is_finite_date(duckdb_date date);
duckdb_time_struct duckdb_from_time(duckdb_time time);
duckdb_time_tz duckdb_create_time_tz(int64_t micros, int32_t offset);
duckdb_time_tz_struct duckdb_from_time_tz(duckdb_time_tz micros);
duckdb_time duckdb_to_time(duckdb_time_struct time);
duckdb_timestamp_struct duckdb_from_timestamp(duckdb_timestamp ts);
duckdb_timestamp duckdb_to_timestamp(duckdb_timestamp_struct ts);
bool duckdb_is_finite_timestamp(duckdb_timestamp ts);
bool duckdb_is_finite_timestamp_s(duckdb_timestamp_s ts);
bool duckdb_is_finite_timestamp_ms(duckdb_timestamp_ms ts);
bool duckdb_is_finite_timestamp_ns(duckdb_timestamp_ns ts);

Hugeint Helpers

double duckdb_hugeint_to_double(duckdb_hugeint val);
duckdb_hugeint duckdb_double_to_hugeint(double val);

Decimal Helpers

duckdb_decimal duckdb_double_to_decimal(double val, uint8_t width, uint8_t scale);
double duckdb_decimal_to_double(duckdb_decimal val);

Logical Type Interface

duckdb_logical_type duckdb_create_logical_type(duckdb_type type);
char *duckdb_logical_type_get_alias(duckdb_logical_type type);
void duckdb_logical_type_set_alias(duckdb_logical_type type, const char *alias);
duckdb_logical_type duckdb_create_list_type(duckdb_logical_type type);
duckdb_logical_type duckdb_create_array_type(duckdb_logical_type type, idx_t array_size);
duckdb_logical_type duckdb_create_map_type(duckdb_logical_type key_type, duckdb_logical_type value_type);
duckdb_logical_type duckdb_create_union_type(duckdb_logical_type *member_types, const char **member_names, idx_t member_count);
duckdb_logical_type duckdb_create_struct_type(duckdb_logical_type *member_types, const char **member_names, idx_t member_count);
duckdb_logical_type duckdb_create_enum_type(const char **member_names, idx_t member_count);
duckdb_logical_type duckdb_create_decimal_type(uint8_t width, uint8_t scale);
duckdb_type duckdb_get_type_id(duckdb_logical_type type);
uint8_t duckdb_decimal_width(duckdb_logical_type type);
uint8_t duckdb_decimal_scale(duckdb_logical_type type);
duckdb_type duckdb_decimal_internal_type(duckdb_logical_type type);
duckdb_type duckdb_enum_internal_type(duckdb_logical_type type);
uint32_t duckdb_enum_dictionary_size(duckdb_logical_type type);
char *duckdb_enum_dictionary_value(duckdb_logical_type type, idx_t index);
duckdb_logical_type duckdb_list_type_child_type(duckdb_logical_type type);
duckdb_logical_type duckdb_array_type_child_type(duckdb_logical_type type);
idx_t duckdb_array_type_array_size(duckdb_logical_type type);
duckdb_logical_type duckdb_map_type_key_type(duckdb_logical_type type);
duckdb_logical_type duckdb_map_type_value_type(duckdb_logical_type type);
idx_t duckdb_struct_type_child_count(duckdb_logical_type type);
char *duckdb_struct_type_child_name(duckdb_logical_type type, idx_t index);
duckdb_logical_type duckdb_struct_type_child_type(duckdb_logical_type type, idx_t index);
idx_t duckdb_union_type_member_count(duckdb_logical_type type);
char *duckdb_union_type_member_name(duckdb_logical_type type, idx_t index);
duckdb_logical_type duckdb_union_type_member_type(duckdb_logical_type type, idx_t index);
void duckdb_destroy_logical_type(duckdb_logical_type *type);
duckdb_state duckdb_register_logical_type(duckdb_connection con, duckdb_logical_type type, duckdb_create_type_info info);

duckdb_result_get_chunk

Warning Deprecation notice. This method is scheduled for removal in a future release.

Fetches a data chunk from the duckdb_result. This function should be called repeatedly until the result is exhausted.

The result must be destroyed with duckdb_destroy_data_chunk.

This function supersedes all duckdb_value functions, as well as the duckdb_column_data and duckdb_nullmask_data functions. It results in significantly better performance, and should be preferred in newer code-bases.

If this function is used, none of the other result functions can be used and vice versa (i.e., this function cannot be mixed with the legacy result functions).

Use duckdb_result_chunk_count to figure out how many chunks there are in the result.

Syntax
duckdb_data_chunk duckdb_result_get_chunk(
  duckdb_result result,
  idx_t chunk_index
);
Parameters
  • result: The result object to fetch the data chunk from.
  • chunk_index: The chunk index to fetch from.
Return Value

The resulting data chunk. Returns NULL if the chunk index is out of bounds.


duckdb_result_is_streaming

Warning Deprecation notice. This method is scheduled for removal in a future release.

Checks if the type of the internal result is StreamQueryResult.

Syntax
bool duckdb_result_is_streaming(
  duckdb_result result
);
Parameters
  • result: The result object to check.
Return Value

Whether or not the result object is of the type StreamQueryResult


duckdb_result_chunk_count

Warning Deprecation notice. This method is scheduled for removal in a future release.

Returns the number of data chunks present in the result.

Syntax
idx_t duckdb_result_chunk_count(
  duckdb_result result
);
Parameters
  • result: The result object
Return Value

Number of data chunks present in the result.


duckdb_result_return_type

Returns the return_type of the given result, or DUCKDB_RETURN_TYPE_INVALID on error

Syntax
duckdb_result_type duckdb_result_return_type(
  duckdb_result result
);
Parameters
  • result: The result object
Return Value

The return_type


duckdb_from_date

Decompose a duckdb_date object into year, month and date (stored as duckdb_date_struct).

Syntax
duckdb_date_struct duckdb_from_date(
  duckdb_date date
);
Parameters
  • date: The date object, as obtained from a DUCKDB_TYPE_DATE column.
Return Value

The duckdb_date_struct with the decomposed elements.


duckdb_to_date

Re-compose a duckdb_date from year, month and date (duckdb_date_struct).

Syntax
duckdb_date duckdb_to_date(
  duckdb_date_struct date
);
Parameters
  • date: The year, month and date stored in a duckdb_date_struct.
Return Value

The duckdb_date element.


duckdb_is_finite_date

Test a duckdb_date to see if it is a finite value.

Syntax
bool duckdb_is_finite_date(
  duckdb_date date
);
Parameters
  • date: The date object, as obtained from a DUCKDB_TYPE_DATE column.
Return Value

True if the date is finite, false if it is ±infinity.


duckdb_from_time

Decompose a duckdb_time object into hour, minute, second and microsecond (stored as duckdb_time_struct).

Syntax
duckdb_time_struct duckdb_from_time(
  duckdb_time time
);
Parameters
  • time: The time object, as obtained from a DUCKDB_TYPE_TIME column.
Return Value

The duckdb_time_struct with the decomposed elements.


duckdb_create_time_tz

Create a duckdb_time_tz object from micros and a timezone offset.

Syntax
duckdb_time_tz duckdb_create_time_tz(
  int64_t micros,
  int32_t offset
);
Parameters
  • micros: The microsecond component of the time.
  • offset: The timezone offset component of the time.
Return Value

The duckdb_time_tz element.


duckdb_from_time_tz

Decompose a TIME_TZ objects into micros and a timezone offset.

Use duckdb_from_time to further decompose the micros into hour, minute, second and microsecond.

Syntax
duckdb_time_tz_struct duckdb_from_time_tz(
  duckdb_time_tz micros
);
Parameters
  • micros: The time object, as obtained from a DUCKDB_TYPE_TIME_TZ column.

duckdb_to_time

Re-compose a duckdb_time from hour, minute, second and microsecond (duckdb_time_struct).

Syntax
duckdb_time duckdb_to_time(
  duckdb_time_struct time
);
Parameters
  • time: The hour, minute, second and microsecond in a duckdb_time_struct.
Return Value

The duckdb_time element.


duckdb_from_timestamp

Decompose a duckdb_timestamp object into a duckdb_timestamp_struct.

Syntax
duckdb_timestamp_struct duckdb_from_timestamp(
  duckdb_timestamp ts
);
Parameters
  • ts: The ts object, as obtained from a DUCKDB_TYPE_TIMESTAMP column.
Return Value

The duckdb_timestamp_struct with the decomposed elements.


duckdb_to_timestamp

Re-compose a duckdb_timestamp from a duckdb_timestamp_struct.

Syntax
duckdb_timestamp duckdb_to_timestamp(
  duckdb_timestamp_struct ts
);
Parameters
  • ts: The de-composed elements in a duckdb_timestamp_struct.
Return Value

The duckdb_timestamp element.


duckdb_is_finite_timestamp

Test a duckdb_timestamp to see if it is a finite value.

Syntax
bool duckdb_is_finite_timestamp(
  duckdb_timestamp ts
);
Parameters
  • ts: The duckdb_timestamp object, as obtained from a DUCKDB_TYPE_TIMESTAMP column.
Return Value

True if the timestamp is finite, false if it is ±infinity.


duckdb_is_finite_timestamp_s

Test a duckdb_timestamp_s to see if it is a finite value.

Syntax
bool duckdb_is_finite_timestamp_s(
  duckdb_timestamp_s ts
);
Parameters
  • ts: The duckdb_timestamp_s object, as obtained from a DUCKDB_TYPE_TIMESTAMP_S column.
Return Value

True if the timestamp is finite, false if it is ±infinity.


duckdb_is_finite_timestamp_ms

Test a duckdb_timestamp_ms to see if it is a finite value.

Syntax
bool duckdb_is_finite_timestamp_ms(
  duckdb_timestamp_ms ts
);
Parameters
  • ts: The duckdb_timestamp_ms object, as obtained from a DUCKDB_TYPE_TIMESTAMP_MS column.
Return Value

True if the timestamp is finite, false if it is ±infinity.


duckdb_is_finite_timestamp_ns

Test a duckdb_timestamp_ns to see if it is a finite value.

Syntax
bool duckdb_is_finite_timestamp_ns(
  duckdb_timestamp_ns ts
);
Parameters
  • ts: The duckdb_timestamp_ns object, as obtained from a DUCKDB_TYPE_TIMESTAMP_NS column.
Return Value

True if the timestamp is finite, false if it is ±infinity.


duckdb_hugeint_to_double

Converts a duckdb_hugeint object (as obtained from a DUCKDB_TYPE_HUGEINT column) into a double.

Syntax
double duckdb_hugeint_to_double(
  duckdb_hugeint val
);
Parameters
  • val: The hugeint value.
Return Value

The converted double element.


duckdb_double_to_hugeint

Converts a double value to a duckdb_hugeint object.

If the conversion fails because the double value is too big the result will be 0.

Syntax
duckdb_hugeint duckdb_double_to_hugeint(
  double val
);
Parameters
  • val: The double value.
Return Value

The converted duckdb_hugeint element.


duckdb_double_to_decimal

Converts a double value to a duckdb_decimal object.

If the conversion fails because the double value is too big, or the width/scale are invalid the result will be 0.

Syntax
duckdb_decimal duckdb_double_to_decimal(
  double val,
  uint8_t width,
  uint8_t scale
);
Parameters
  • val: The double value.
Return Value

The converted duckdb_decimal element.


duckdb_decimal_to_double

Converts a duckdb_decimal object (as obtained from a DUCKDB_TYPE_DECIMAL column) into a double.

Syntax
double duckdb_decimal_to_double(
  duckdb_decimal val
);
Parameters
  • val: The decimal value.
Return Value

The converted double element.


duckdb_create_logical_type

Creates a duckdb_logical_type from a primitive type. The resulting logical type must be destroyed with duckdb_destroy_logical_type.

Returns an invalid logical type, if type is: DUCKDB_TYPE_INVALID, DUCKDB_TYPE_DECIMAL, DUCKDB_TYPE_ENUM, DUCKDB_TYPE_LIST, DUCKDB_TYPE_STRUCT, DUCKDB_TYPE_MAP, DUCKDB_TYPE_ARRAY, or DUCKDB_TYPE_UNION.

Syntax
duckdb_logical_type duckdb_create_logical_type(
  duckdb_type type
);
Parameters
  • type: The primitive type to create.
Return Value

The logical type.


duckdb_logical_type_get_alias

Returns the alias of a duckdb_logical_type, if set, else nullptr. The result must be destroyed with duckdb_free.

Syntax
char *duckdb_logical_type_get_alias(
  duckdb_logical_type type
);
Parameters
  • type: The logical type
Return Value

The alias or nullptr


duckdb_logical_type_set_alias

Sets the alias of a duckdb_logical_type.

Syntax
void duckdb_logical_type_set_alias(
  duckdb_logical_type type,
  const char *alias
);
Parameters
  • type: The logical type
  • alias: The alias to set

duckdb_create_list_type

Creates a LIST type from its child type. The return type must be destroyed with duckdb_destroy_logical_type.

Syntax
duckdb_logical_type duckdb_create_list_type(
  duckdb_logical_type type
);
Parameters
  • type: The child type of the list
Return Value

The logical type.


duckdb_create_array_type

Creates an ARRAY type from its child type. The return type must be destroyed with duckdb_destroy_logical_type.

Syntax
duckdb_logical_type duckdb_create_array_type(
  duckdb_logical_type type,
  idx_t array_size
);
Parameters
  • type: The child type of the array.
  • array_size: The number of elements in the array.
Return Value

The logical type.


duckdb_create_map_type

Creates a MAP type from its key type and value type. The return type must be destroyed with duckdb_destroy_logical_type.

Syntax
duckdb_logical_type duckdb_create_map_type(
  duckdb_logical_type key_type,
  duckdb_logical_type value_type
);
Parameters
  • key_type: The map's key type.
  • value_type: The map's value type.
Return Value

The logical type.


duckdb_create_union_type

Creates a UNION type from the passed arrays. The return type must be destroyed with duckdb_destroy_logical_type.

Syntax
duckdb_logical_type duckdb_create_union_type(
  duckdb_logical_type *member_types,
  const char **member_names,
  idx_t member_count
);
Parameters
  • member_types: The array of union member types.
  • member_names: The union member names.
  • member_count: The number of union members.
Return Value

The logical type.


duckdb_create_struct_type

Creates a STRUCT type based on the member types and names. The resulting type must be destroyed with duckdb_destroy_logical_type.

Syntax
duckdb_logical_type duckdb_create_struct_type(
  duckdb_logical_type *member_types,
  const char **member_names,
  idx_t member_count
);
Parameters
  • member_types: The array of types of the struct members.
  • member_names: The array of names of the struct members.
  • member_count: The number of members of the struct.
Return Value

The logical type.


duckdb_create_enum_type

Creates an ENUM type from the passed member name array. The resulting type should be destroyed with duckdb_destroy_logical_type.

Syntax
duckdb_logical_type duckdb_create_enum_type(
  const char **member_names,
  idx_t member_count
);
Parameters
  • member_names: The array of names that the enum should consist of.
  • member_count: The number of elements that were specified in the array.
Return Value

The logical type.


duckdb_create_decimal_type

Creates a DECIMAL type with the specified width and scale. The resulting type should be destroyed with duckdb_destroy_logical_type.

Syntax
duckdb_logical_type duckdb_create_decimal_type(
  uint8_t width,
  uint8_t scale
);
Parameters
  • width: The width of the decimal type
  • scale: The scale of the decimal type
Return Value

The logical type.


duckdb_get_type_id

Retrieves the enum duckdb_type of a duckdb_logical_type.

Syntax
duckdb_type duckdb_get_type_id(
  duckdb_logical_type type
);
Parameters
  • type: The logical type.
Return Value

The duckdb_type id.


duckdb_decimal_width

Retrieves the width of a decimal type.

Syntax
uint8_t duckdb_decimal_width(
  duckdb_logical_type type
);
Parameters
  • type: The logical type object
Return Value

The width of the decimal type


duckdb_decimal_scale

Retrieves the scale of a decimal type.

Syntax
uint8_t duckdb_decimal_scale(
  duckdb_logical_type type
);
Parameters
  • type: The logical type object
Return Value

The scale of the decimal type


duckdb_decimal_internal_type

Retrieves the internal storage type of a decimal type.

Syntax
duckdb_type duckdb_decimal_internal_type(
  duckdb_logical_type type
);
Parameters
  • type: The logical type object
Return Value

The internal type of the decimal type


duckdb_enum_internal_type

Retrieves the internal storage type of an enum type.

Syntax
duckdb_type duckdb_enum_internal_type(
  duckdb_logical_type type
);
Parameters
  • type: The logical type object
Return Value

The internal type of the enum type


duckdb_enum_dictionary_size

Retrieves the dictionary size of the enum type.

Syntax
uint32_t duckdb_enum_dictionary_size(
  duckdb_logical_type type
);
Parameters
  • type: The logical type object
Return Value

The dictionary size of the enum type


duckdb_enum_dictionary_value

Retrieves the dictionary value at the specified position from the enum.

The result must be freed with duckdb_free.

Syntax
char *duckdb_enum_dictionary_value(
  duckdb_logical_type type,
  idx_t index
);
Parameters
  • type: The logical type object
  • index: The index in the dictionary
Return Value

The string value of the enum type. Must be freed with duckdb_free.


duckdb_list_type_child_type

Retrieves the child type of the given LIST type. Also accepts MAP types. The result must be freed with duckdb_destroy_logical_type.

Syntax
duckdb_logical_type duckdb_list_type_child_type(
  duckdb_logical_type type
);
Parameters
  • type: The logical type, either LIST or MAP.
Return Value

The child type of the LIST or MAP type.


duckdb_array_type_child_type

Retrieves the child type of the given ARRAY type.

The result must be freed with duckdb_destroy_logical_type.

Syntax
duckdb_logical_type duckdb_array_type_child_type(
  duckdb_logical_type type
);
Parameters
  • type: The logical type. Must be ARRAY.
Return Value

The child type of the ARRAY type.


duckdb_array_type_array_size

Retrieves the array size of the given array type.

Syntax
idx_t duckdb_array_type_array_size(
  duckdb_logical_type type
);
Parameters
  • type: The logical type object
Return Value

The fixed number of elements the values of this array type can store.


duckdb_map_type_key_type

Retrieves the key type of the given map type.

The result must be freed with duckdb_destroy_logical_type.

Syntax
duckdb_logical_type duckdb_map_type_key_type(
  duckdb_logical_type type
);
Parameters
  • type: The logical type object
Return Value

The key type of the map type. Must be destroyed with duckdb_destroy_logical_type.


duckdb_map_type_value_type

Retrieves the value type of the given map type.

The result must be freed with duckdb_destroy_logical_type.

Syntax
duckdb_logical_type duckdb_map_type_value_type(
  duckdb_logical_type type
);
Parameters
  • type: The logical type object
Return Value

The value type of the map type. Must be destroyed with duckdb_destroy_logical_type.


duckdb_struct_type_child_count

Returns the number of children of a struct type.

Syntax
idx_t duckdb_struct_type_child_count(
  duckdb_logical_type type
);
Parameters
  • type: The logical type object
Return Value

The number of children of a struct type.


duckdb_struct_type_child_name

Retrieves the name of the struct child.

The result must be freed with duckdb_free.

Syntax
char *duckdb_struct_type_child_name(
  duckdb_logical_type type,
  idx_t index
);
Parameters
  • type: The logical type object
  • index: The child index
Return Value

The name of the struct type. Must be freed with duckdb_free.


duckdb_struct_type_child_type

Retrieves the child type of the given struct type at the specified index.

The result must be freed with duckdb_destroy_logical_type.

Syntax
duckdb_logical_type duckdb_struct_type_child_type(
  duckdb_logical_type type,
  idx_t index
);
Parameters
  • type: The logical type object
  • index: The child index
Return Value

The child type of the struct type. Must be destroyed with duckdb_destroy_logical_type.


duckdb_union_type_member_count

Returns the number of members that the union type has.

Syntax
idx_t duckdb_union_type_member_count(
  duckdb_logical_type type
);
Parameters
  • type: The logical type (union) object
Return Value

The number of members of a union type.


duckdb_union_type_member_name

Retrieves the name of the union member.

The result must be freed with duckdb_free.

Syntax
char *duckdb_union_type_member_name(
  duckdb_logical_type type,
  idx_t index
);
Parameters
  • type: The logical type object
  • index: The child index
Return Value

The name of the union member. Must be freed with duckdb_free.


duckdb_union_type_member_type

Retrieves the child type of the given union member at the specified index.

The result must be freed with duckdb_destroy_logical_type.

Syntax
duckdb_logical_type duckdb_union_type_member_type(
  duckdb_logical_type type,
  idx_t index
);
Parameters
  • type: The logical type object
  • index: The child index
Return Value

The child type of the union member. Must be destroyed with duckdb_destroy_logical_type.


duckdb_destroy_logical_type

Destroys the logical type and de-allocates all memory allocated for that type.

Syntax
void duckdb_destroy_logical_type(
  duckdb_logical_type *type
);
Parameters
  • type: The logical type to destroy.

duckdb_register_logical_type

Registers a custom type within the given connection. The type must have an alias

Syntax
duckdb_state duckdb_register_logical_type(
  duckdb_connection con,
  duckdb_logical_type type,
  duckdb_create_type_info info
);
Parameters
  • con: The connection to use
  • type: The custom type to register
Return Value

Whether or not the registration was successful.


--- layout: docu title: Gitignore for DuckDB ---

If you work in a Git repository, you may want to configure your Gitignore to disable tracking [files created by DuckDB]({% link docs/operations_manual/footprint_of_duckdb/files_created_by_duckdb.md %}). These potentially include the DuckDB database, write ahead log, temporary files.

Sample Gitignore Files

In the following, we present sample Gitignore configuration snippets for DuckDB.

Ignore Temporary Files but Keep Database

This configuration is useful if you would like to keep the database file in the version control system:

*.wal
*.tmp/

Ignore Database and Temporary Files

If you would like to ignore both the database and the temporary files, extend the Gitignore file to include the database file. The exact Gitignore configuration to achieve this depends on the extension you use for your DuckDB databases (.duckdb, .db, .ddb, etc.). For example, if your DuckDB files use the .duckdb extension, add the following lines to your .gitignore file:

*.duckdb*
*.wal
*.tmp/

layout: docu title: Limits

This page contains DuckDB's built-in limit values.

Limit Default value Configuration option Comment
Array size 100000 -
BLOB size 4 GB -
Expression depth 1000 [max_expression_depth]({% link docs/configuration/overview.md %})
Memory allocation for a vector 128 GB -
Memory use 80% of RAM [memory_limit]({% link docs/configuration/pragmas.md %}#memory-limit) Note: This limit only applies to the buffer manager.
String size 4 GB -
Temporary directory size unlimited [max_temp_directory_size]({% link docs/configuration/overview.md %})

layout: docu title: Configuration redirect_from:

  • /docs/configuration
  • /docs/configuration/
  • /docs/sql/configuration
  • /docs/sql/configuration/

DuckDB has a number of configuration options that can be used to change the behavior of the system.

The configuration options can be set using either the [SET statement]({% link docs/sql/statements/set.md %}) or the [PRAGMA statement]({% link docs/configuration/pragmas.md %}). They can be reset to their original values using the [RESET statement]({% link docs/sql/statements/set.md %}#reset).

The values of configuration options can be queried via the [current_setting() scalar function]({% link docs/sql/functions/utility.md %}) or using the [duckdb_settings() table function]({% link docs/sql/meta/duckdb_table_functions.md %}#duckdb_settings). For example:

SELECT current_setting('memory_limit') AS memlimit;

Or:

SELECT value AS memlimit
FROM duckdb_settings()
WHERE name = 'memory_limit';

Examples

Set the memory limit of the system to 10 GB.

SET memory_limit = '10GB';

Configure the system to use 1 thread.

SET threads TO 1;

Enable printing of a progress bar during long-running queries.

SET enable_progress_bar = true;

Set the default null order to NULLS LAST.

SET default_null_order = 'nulls_last';

Return the current value of a specific setting.

SELECT current_setting('threads') AS threads;
threads
10

Query a specific setting.

SELECT *
FROM duckdb_settings()
WHERE name = 'threads';
name value description input_type scope
threads 1 The number of total threads used by the system. BIGINT GLOBAL

Show a list of all available settings.

SELECT *
FROM duckdb_settings();

Reset the memory limit of the system back to the default.

RESET memory_limit;

Secrets Manager

DuckDB has a [Secrets manager]({% link docs/sql/statements/create_secret.md %}), which provides a unified user interface for secrets across all backends (e.g., AWS S3) that use them.

Configuration Reference

Configuration options come with different default [scopes]({% link docs/sql/statements/set.md %}#scopes): GLOBAL and LOCAL. Below is a list of all available configuration options by scope.

Global Configuration Options

Name Description Type Default value
Calendar The current calendar VARCHAR System (locale) calendar
TimeZone The current time zone VARCHAR System (locale) timezone
access_mode Access mode of the database (AUTOMATIC, READ_ONLY or READ_WRITE) VARCHAR automatic
allocator_background_threads Whether to enable the allocator background thread. BOOLEAN false
allocator_bulk_deallocation_flush_threshold If a bulk deallocation larger than this occurs, flush outstanding allocations. VARCHAR 512.0 MiB
allocator_flush_threshold Peak allocation threshold at which to flush the allocator after completing a task. VARCHAR 128.0 MiB
allow_community_extensions Allow to load community built extensions BOOLEAN true
allow_extensions_metadata_mismatch Allow to load extensions with not compatible metadata BOOLEAN false
allow_persistent_secrets Allow the creation of persistent secrets, that are stored and loaded on restarts BOOLEAN true
allow_unredacted_secrets Allow printing unredacted secrets BOOLEAN false
allow_unsigned_extensions Allow to load extensions with invalid or missing signatures BOOLEAN false
allowed_directories List of directories/prefixes that are ALWAYS allowed to be queried - even when enable_external_access is false VARCHAR[] []
allowed_paths List of files that are ALWAYS allowed to be queried - even when enable_external_access is false VARCHAR[] []
arrow_large_buffer_size Whether Arrow buffers for strings, blobs, uuids and bits should be exported using large buffers BOOLEAN false
arrow_lossless_conversion Whenever a DuckDB type does not have a clear native or canonical extension match in Arrow, export the types with a duckdb.type_name extension name. BOOLEAN false
arrow_output_list_view Whether export to Arrow format should use ListView as the physical layout for LIST columns BOOLEAN false
autoinstall_extension_repository Overrides the custom endpoint for extension installation on autoloading VARCHAR
autoinstall_known_extensions Whether known extensions are allowed to be automatically installed when a query depends on them BOOLEAN true
autoload_known_extensions Whether known extensions are allowed to be automatically loaded when a query depends on them BOOLEAN true
binary_as_string In Parquet files, interpret binary data as a string. BOOLEAN
ca_cert_file Path to a custom certificate file for self-signed certificates. VARCHAR
catalog_error_max_schemas The maximum number of schemas the system will scan for "did you mean..." style errors in the catalog UBIGINT 100
checkpoint_threshold, wal_autocheckpoint The WAL size threshold at which to automatically trigger a checkpoint (e.g., 1GB) VARCHAR 16.0 MiB
custom_extension_repository Overrides the custom endpoint for remote extension installation VARCHAR
custom_user_agent Metadata from DuckDB callers VARCHAR
default_block_size The default block size for new duckdb database files (new as-in, they do not yet exist). UBIGINT 262144
default_collation The collation setting used when none is specified VARCHAR
default_null_order, null_order NULL ordering used when none is specified (NULLS_FIRST or NULLS_LAST) VARCHAR NULLS_LAST
default_order The order type used when none is specified (ASC or DESC) VARCHAR ASC
default_secret_storage Allows switching the default storage for secrets VARCHAR local_file
disable_parquet_prefetching Disable the prefetching mechanism in Parquet BOOLEAN false
disabled_compression_methods Disable a specific set of compression methods (comma separated) VARCHAR
disabled_filesystems Disable specific file systems preventing access (e.g., LocalFileSystem) VARCHAR
disabled_log_types Sets the list of disabled loggers VARCHAR
duckdb_api DuckDB API surface VARCHAR cli
enable_external_access Allow the database to access external state (through e.g., loading/installing modules, COPY TO/FROM, CSV readers, pandas replacement scans, etc) BOOLEAN true
enable_fsst_vectors Allow scans on FSST compressed segments to emit compressed vectors to utilize late decompression BOOLEAN false
enable_geoparquet_conversion Attempt to decode/encode geometry data in/as GeoParquet files if the spatial extension is present. BOOLEAN true
enable_http_metadata_cache Whether or not the global http metadata is used to cache HTTP metadata BOOLEAN false
enable_logging Enables the logger BOOLEAN 0
enable_macro_dependencies Enable created MACROs to create dependencies on the referenced objects (such as tables) BOOLEAN false
enable_object_cache [PLACEHOLDER] Legacy setting - does nothing BOOLEAN NULL
enable_server_cert_verification Enable server side certificate verification. BOOLEAN false
enable_view_dependencies Enable created VIEWs to create dependencies on the referenced objects (such as tables) BOOLEAN false
enabled_log_types Sets the list of enabled loggers VARCHAR
extension_directory Set the directory to store extensions in VARCHAR
external_threads The number of external threads that work on DuckDB tasks. UBIGINT 1
force_download Forces upfront download of file BOOLEAN false
http_keep_alive Keep alive connections. Setting this to false can help when running into connection failures BOOLEAN true
http_proxy_password Password for HTTP proxy VARCHAR
http_proxy_username Username for HTTP proxy VARCHAR
http_proxy HTTP proxy host VARCHAR
http_retries HTTP retries on I/O error UBIGINT 3
http_retry_backoff Backoff factor for exponentially increasing retry wait time FLOAT 4
http_retry_wait_ms Time between retries UBIGINT 100
http_timeout HTTP timeout read/write/connection/retry (in seconds) UBIGINT 30
immediate_transaction_mode Whether transactions should be started lazily when needed, or immediately when BEGIN TRANSACTION is called BOOLEAN false
index_scan_max_count The maximum index scan count sets a threshold for index scans. If fewer than MAX(index_scan_max_count, index_scan_percentage * total_row_count) rows match, we perform an index scan instead of a table scan. UBIGINT 2048
index_scan_percentage The index scan percentage sets a threshold for index scans. If fewer than MAX(index_scan_max_count, index_scan_percentage * total_row_count) rows match, we perform an index scan instead of a table scan. DOUBLE 0.001
lock_configuration Whether or not the configuration can be altered BOOLEAN false
logging_level The log level which will be recorded in the log VARCHAR INFO
logging_mode Enables the logger VARCHAR LEVEL_ONLY
logging_storage Set the logging storage (memory/stdout/file) VARCHAR memory
max_memory, memory_limit The maximum memory of the system (e.g., 1GB) VARCHAR 80% of RAM
max_temp_directory_size The maximum amount of data stored inside the 'temp_directory' (when set) (e.g., 1GB) VARCHAR 90% of available disk space
max_vacuum_tasks The maximum vacuum tasks to schedule during a checkpoint. UBIGINT 100
old_implicit_casting Allow implicit casting to/from VARCHAR BOOLEAN false
parquet_metadata_cache Cache Parquet metadata - useful when reading the same files multiple times BOOLEAN false
password The password to use. Ignored for legacy compatibility. VARCHAR NULL
prefetch_all_parquet_files Use the prefetching mechanism for all types of parquet files BOOLEAN false
preserve_insertion_order Whether or not to preserve insertion order. If set to false the system is allowed to re-order any results that do not contain ORDER BY clauses. BOOLEAN true
produce_arrow_string_view Whether strings should be produced by DuckDB in Utf8View format instead of Utf8 BOOLEAN false
s3_access_key_id S3 Access Key ID VARCHAR
s3_endpoint S3 Endpoint VARCHAR
s3_region S3 Region VARCHAR us-east-1
s3_secret_access_key S3 Access Key VARCHAR
s3_session_token S3 Session Token VARCHAR
s3_uploader_max_filesize S3 Uploader max filesize (between 50GB and 5TB) VARCHAR 800GB
s3_uploader_max_parts_per_file S3 Uploader max parts per file (between 1 and 10000) UBIGINT 10000
s3_uploader_thread_limit S3 Uploader global thread limit UBIGINT 50
s3_url_compatibility_mode Disable Globs and Query Parameters on S3 URLs BOOLEAN false
s3_url_style S3 URL style VARCHAR vhost
s3_use_ssl S3 use SSL BOOLEAN true
secret_directory Set the directory to which persistent secrets are stored VARCHAR ~/.duckdb/stored_secrets
storage_compatibility_version Serialize on checkpoint with compatibility for a given duckdb version VARCHAR v0.10.2
temp_directory Set the directory to which to write temp files VARCHAR ⟨database_name⟩.tmp or .tmp (in in-memory mode)
threads, worker_threads The number of total threads used by the system. BIGINT # CPU cores
username, user The username to use. Ignored for legacy compatibility. VARCHAR NULL
zstd_min_string_length The (average) length at which to enable ZSTD compression, defaults to 4096 UBIGINT 4096

Local Configuration Options

Name Description Type Default value
custom_profiling_settings Accepts a JSON enabling custom metrics VARCHAR {"ROWS_RETURNED": "true", "LATENCY": "true", "RESULT_SET_SIZE": "true", "OPERATOR_TIMING": "true", "OPERATOR_ROWS_SCANNED": "true", "CUMULATIVE_ROWS_SCANNED": "true", "OPERATOR_CARDINALITY": "true", "OPERATOR_TYPE": "true", "OPERATOR_NAME": "true", "CUMULATIVE_CARDINALITY": "true", "EXTRA_INFO": "true", "CPU_TIME": "true", "BLOCKE...
dynamic_or_filter_threshold The maximum amount of OR filters we generate dynamically from a hash join UBIGINT 50
enable_http_logging Enables HTTP logging BOOLEAN false
enable_profiling Enables profiling, and sets the output format (JSON, QUERY_TREE, QUERY_TREE_OPTIMIZER) VARCHAR NULL
enable_progress_bar_print Controls the printing of the progress bar, when 'enable_progress_bar' is true BOOLEAN true
enable_progress_bar Enables the progress bar, printing progress to the terminal for long queries BOOLEAN true
errors_as_json Output error messages as structured JSON instead of as a raw string BOOLEAN false
explain_output Output of EXPLAIN statements (ALL, OPTIMIZED_ONLY, PHYSICAL_ONLY) VARCHAR physical_only
file_search_path A comma separated list of directories to search for input files VARCHAR
home_directory Sets the home directory used by the system VARCHAR
http_logging_output The file to which HTTP logging output should be saved, or empty to print to the terminal VARCHAR
ieee_floating_point_ops Use IEE754-compliant floating point operations (returning NAN instead of errors/NULL). BOOLEAN true
integer_division Whether or not the / operator defaults to integer division, or to floating point division BOOLEAN false
late_materialization_max_rows The maximum amount of rows in the LIMIT/SAMPLE for which we trigger late materialization UBIGINT 50
log_query_path Specifies the path to which queries should be logged (default: NULL, queries are not logged) VARCHAR NULL
max_expression_depth The maximum expression depth limit in the parser. WARNING: increasing this setting and using very deep expressions might lead to stack overflow errors. UBIGINT 1000
merge_join_threshold The number of rows we need on either table to choose a merge join UBIGINT 1000
nested_loop_join_threshold The number of rows we need on either table to choose a nested loop join UBIGINT 5
order_by_non_integer_literal Allow ordering by non-integer literals - ordering by such literals has no effect. BOOLEAN false
ordered_aggregate_threshold The number of rows to accumulate before sorting, used for tuning UBIGINT 262144
partitioned_write_flush_threshold The threshold in number of rows after which we flush a thread state when writing using PARTITION_BY UBIGINT 524288
partitioned_write_max_open_files The maximum amount of files the system can keep open before flushing to disk when writing using PARTITION_BY UBIGINT 100
perfect_ht_threshold Threshold in bytes for when to use a perfect hash table UBIGINT 12
pivot_filter_threshold The threshold to switch from using filtered aggregates to LIST with a dedicated pivot operator UBIGINT 20
pivot_limit The maximum number of pivot columns in a pivot statement UBIGINT 100000
prefer_range_joins Force use of range joins with mixed predicates BOOLEAN false
preserve_identifier_case Whether or not to preserve the identifier case, instead of always lowercasing all non-quoted identifiers BOOLEAN true
profile_output, profiling_output The file to which profile output should be saved, or empty to print to the terminal VARCHAR
profiling_mode The profiling mode (STANDARD or DETAILED) VARCHAR NULL
progress_bar_time Sets the time (in milliseconds) how long a query needs to take before we start printing a progress bar BIGINT 2000
scalar_subquery_error_on_multiple_rows When a scalar subquery returns multiple rows - return a random row instead of returning an error. BOOLEAN true
schema Sets the default search schema. Equivalent to setting search_path to a single value. VARCHAR main
search_path Sets the default catalog search path as a comma-separated list of values VARCHAR
streaming_buffer_size The maximum memory to buffer between fetching from a streaming result (e.g., 1GB) VARCHAR 976.5 KiB

layout: docu title: Files Created by DuckDB

DuckDB creates several files and directories on disk. This page lists both the global and the local ones.

Global Files and Directories

DuckDB creates the following global files and directories in the user's home directory (denoted with ~):

Location Description Shared between versions Shared between clients
~/.duckdbrc The content of this file is executed when starting the [DuckDB CLI client]({% link docs/clients/cli/overview.md %}). The commands can be both [dot command]({% link docs/clients/cli/dot_commands.md %}) and SQL statements. The naming of this file follows the ~/.bashrc and ~/.zshrc “run commands” files. Yes Only used by CLI
~/.duckdb_history History file, similar to ~/.bash_history and ~/.zsh_history. Used by the [DuckDB CLI client]({% link docs/clients/cli/overview.md %}). Yes Only used by CLI
~/.duckdb/extensions Binaries of installed [extensions]({% link docs/extensions/overview.md %}). No Yes
~/.duckdb/stored_secrets [Persistent secrets]({% link docs/configuration/secrets_manager.md %}#persistent-secrets) created by the [Secrets manager]({% link docs/configuration/secrets_manager.md %}). Yes Yes

Local Files and Directories

DuckDB creates the following files and directories in the working directory (for in-memory connections) or relative to the database file (for persistent connections):

Name Description Example
⟨database_filename⟩ Database file. Only created in on-disk mode. The file can have any extension with typical extensions being .duckdb, .db, and .ddb. weather.duckdb
.tmp/ Temporary directory. Only created in in-memory mode. .tmp/
⟨database_filename⟩.tmp/ Temporary directory. Only created in on-disk mode. weather.tmp/
⟨database_filename⟩.wal Write-ahead log file. If DuckDB exits normally, the WAL file is deleted upon exit. If DuckDB crashes, the WAL file is required to recover data. weather.wal

If you are working in a Git repository and would like to disable tracking these files by Git, see the instructions on using [.gitignore for DuckDB]({% link docs/operations_manual/footprint_of_duckdb/gitignore_for_duckdb.md %}).

layout: docu title: Reclaiming Space

DuckDB uses a single-file format, which has some inherent limitations w.r.t. reclaiming disk space.

CHECKPOINT

To reclaim space after deleting rows, use the [CHECKPOINT statement]({% link docs/sql/statements/checkpoint.md %}).

VACUUM

The [VACUUM statement]({% link docs/sql/statements/vacuum.md %}) does not trigger vacuuming deletes and hence does not reclaim space.

Compacting a Database by Copying

To compact the database, you can create a fresh copy of the database using the [COPY FROM DATABASE statement]({% link docs/sql/statements/copy.md %}#copy-from-database--to). In the following example, we first connect to the original database db1, then the new (empty) database db2. Then, we copy the content of db1 to db2.

ATTACH 'db1.db' AS db1;
ATTACH 'db2.db' AS db2;
COPY FROM DATABASE db1 TO db2;

layout: docu title: Securing Extensions

DuckDB has a powerful extension mechanism, which have the same privileges as the user running DuckDB's (parent) process. This introduces security considerations. Therefore, we recommend reviewing the configuration options listed on this page and setting them according to your attack models.

DuckDB Signature Checks

DuckDB extensions are checked on every load using the signature of the binaries. There are currently three categories of extensions:

  • Signed with a core key. Only extensions vetted by the core DuckDB team are signed with these keys.
  • Signed with a community key. These are open-source extensions distributed via the [DuckDB Community Extensions repository]({% link community_extensions/index.md %}).
  • Unsigned.

Overview of Security Levels for Extensions

DuckDB offers the following security levels for extensions.

Usable extensions Description Configuration
core Extensions can only be loaded if signed from a core key. SET allow_community_extensions = false
core and community Extensions can only be loaded if signed from a core or community key. This is the default security level.
Any extension including unsigned Any extensions can be loaded. SET allow_unsigned_extensions = true

Security-related configuration settings [lock themselves]({% link docs/operations_manual/securing_duckdb/overview.md %}#locking-configurations), i.e., it is only possible to restrict capabilities in the current process.

For example, attempting the following configuration changes will result in an error:

SET allow_community_extensions = false;
SET allow_community_extensions = true;
Invalid Input Error: Cannot upgrade allow_community_extensions setting while database is running

Community Extensions

DuckDB has a [Community Extensions repository]({% link community_extensions/index.md %}), which allows convenient installation of third-party extensions. Community extension repositories like pip or npm are essentially enabling remote code execution by design. This is less dramatic than it sounds. For better or worse, we are quite used to piping random scripts from the web into our shells, and routinely install a staggering amount of transitive dependencies without thinking twice. Some repositories like CRAN enforce a human inspection at some point, but that’s no guarantee for anything either.

We’ve studied several different approaches to community extension repositories and have picked what we think is a sensible approach: we do not attempt to review the submissions, but require that the source code of extensions is available. We do take over the complete build, sign and distribution process. Note that this is a step up from pip and npm that allow uploading arbitrary binaries but a step down from reviewing everything manually. We allow users to report malicious extensions and show adoption statistics like GitHub stars and download count. Because we manage the repository, we can remove problematic extensions from distribution quickly.

Despite this, installing and loading DuckDB extensions from the community extension repository will execute code written by third party developers, and therefore can be dangerous. A malicious developer could create and register a harmless-looking DuckDB extension that steals your crypto coins. If you’re running a web service that executes untrusted SQL from users with DuckDB, it is probably a good idea to disable community extension installation and loading entirely. This can be done like so:

SET allow_community_extensions = false;

Disabling Autoinstalling and Autoloading Known Extensions

By default, DuckDB automatically installs and loads known extensions.

To disable autoinstalling known extensions, run:

SET autoinstall_known_extensions = false;

To disable autoloading known extensions, run:

SET autoload_known_extensions = false;

To lock this configuration, use the [lock_configuration option]({% link docs/operations_manual/securing_duckdb/overview.md %}#locking-configurations):

SET lock_configuration = true;

Always Require Signed Extensions

By default, DuckDB requires extensions to be either signed as core extensions (created by the DuckDB developers) or community extensions (created by third-party developers but distributed by the DuckDB developers). The [allow_unsigned_extensions setting]({% link docs/extensions/overview.md %}#unsigned-extensions) can be enabled on start-up to allow loading unsigned extensions. While this setting is useful for extension development, enabling it will allow DuckDB to load any extensions, which means more care must be taken to ensure malicious extensions are not loaded.

layout: docu title: Securing DuckDB

DuckDB is quite powerful, which can be problematic, especially if untrusted SQL queries are run, e.g., from public-facing user inputs. This page lists some options to restrict the potential fallout from malicious SQL queries.

The approach to securing DuckDB varies depending on your use case, environment, and potential attack models. Therefore, consider the security-related configuration options carefully, especially when working with confidential data sets.

If you plan to embed DuckDB in your application, please consult the [“Embedding DuckDB”]({% link docs/operations_manual/embedding_duckdb.md %}) page.

Reporting Vulnerabilities

If you discover a potential vulnerability, please report it confidentially via GitHub.

Safe Mode (CLI)

DuckDB's CLI client supports [“safe mode”]({% link docs/clients/cli/safe_mode.md %}), which prevents DuckDB from accessing external files other than the database file. This can be activated via a command line argument or a [dot command]({% link docs/clients/cli/dot_commands.md %}):

duckdb -safe ...
.safe_mode

Disabling File Access

DuckDB can list directories and read arbitrary files via its CSV parser’s [read_csv function]({% link docs/data/csv/overview.md %}) or read text via the [read_text function]({% link docs/sql/functions/char.md %}#read_textsource). For example:

SELECT *
FROM read_csv('/etc/passwd', sep = ':');

This can be disabled either by disabling external access altogether (enable_external_access) or disabling individual file systems. For example:

SET disabled_filesystems = 'LocalFileSystem';

Secrets

[Secrets]({% link docs/configuration/secrets_manager.md %}) are used to manage credentials to log into third party services like AWS or Azure. DuckDB can show a list of secrets using the duckdb_secrets() table function. This will redact any sensitive information such as security keys by default. The allow_unredacted_secrets option can be set to show all information contained within a security key. It is recommended not to turn on this option if you are running untrusted SQL input.

Queries can access the secrets defined in the Secrets Manager. For example, if there is a secret defined to authenticate with a user, who has write privileges to a given AWS S3 bucket, queries may write to that bucket. This is applicable for both persistent and temporary secrets.

[Persistent secrets]({% link docs/configuration/secrets_manager.md %}#persistent-secrets) are stored in unencrypted binary format on the disk. These have the same permissions as SSH keys, 600, i.e., only user who is running the DuckDB (parent) process can read and write them.

Locking Configurations

Security-related configuration settings generally lock themselves for safety reasons. For example, while we can disable [Community Extensions]({% link community_extensions/index.md %}) using the SET allow_community_extensions = false, we cannot re-enable them again after the fact without restarting the database. Trying to do so will result in an error:

Invalid Input Error: Cannot upgrade allow_community_extensions setting while database is running

This prevents untrusted SQL input from re-enabling settings that were explicitly disabled for security reasons.

Nevertheless, many configuration settings do not disable themselves, such as the resource constraints. If you allow users to run SQL statements unrestricted on your own hardware, it is recommended that you lock the configuration after your own configuration has finished using the following command:

SET lock_configuration = true;

This prevents any configuration settings from being modified from that point onwards.

Constrain Resource Usage

DuckDB can use quite a lot of CPU, RAM, and disk space. To avoid denial of service attacks, these resources can be limited.

The number of CPU threads that DuckDB can use can be set using, for example:

SET threads = 4;

Where 4 is the number of allowed threads.

The maximum amount of memory (RAM) can also be limited, for example:

SET memory_limit = '4GB';

The size of the temporary file directory can be limited with:

SET max_temp_directory_size = '4GB';

Extensions

DuckDB has a powerful extension mechanism, which have the same privileges as the user running DuckDB's (parent) process. This introduces security considerations. Therefore, we recommend reviewing the configuration options for [securing extensions]({% link docs/operations_manual/securing_duckdb/securing_extensions.md %}).

Privileges

Avoid running DuckDB as a root user (e.g., using sudo). There is no good reason to run DuckDB as root.

Generic Solutions

Securing DuckDB can also be supported via proven means, for example:

  • Scoping user privileges via chroot, relying on the operating system
  • Containerization, e.g., Docker and Podman
  • Running DuckDB in WebAssembly

layout: docu title: TPC-DS Extension github_directory: https://github.com/duckdb/duckdb/tree/main/extension/tpcds

The tpcds extension implements the data generator and queries for the TPC-DS benchmark.

Installing and Loading

The tpcds extension will be transparently [autoloaded]({% link docs/extensions/overview.md %}#autoloading-extensions) on first use from the official extension repository. If you would like to install and load it manually, run:

INSTALL tpcds;
LOAD tpcds;

Usage

To generate data for scale factor 1, use:

CALL dsdgen(sf = 1);

To run a query, e.g., query 8, use:

PRAGMA tpcds(8);
s_store_name sum(ss_net_profit)
able -10354620.18
ation -10576395.52
bar -10625236.01
ese -10076698.16
ought -10994052.78

Generating the Schema

It's possible to generate the schema of TPC-DS without any data by setting the scale factor to 0:

CALL dsdgen(sf = 0);

Limitations

The tpchds(⟨query_id⟩) function runs a fixed TPC-DS query with pre-defined bind parameters (a.k.a. substitution parameters). It is not possible to change the query parameters using the tpcds extension.

layout: docu title: Delta Extension github_repository: https://github.com/duckdb/duckdb-delta

The delta extension adds support for the Delta Lake open-source storage format. It is built using the Delta Kernel. The extension offers read support for Delta tables, both local and remote.

For implementation details, see the [announcement blog post]({% post_url 2024-06-10-delta %}).

Warning The delta extension is currently experimental and is only supported on given platforms.

Installing and Loading

The delta extension will be transparently [autoloaded]({% link docs/extensions/overview.md %}#autoloading-extensions) on first use from the official extension repository. If you would like to install and load it manually, run:

INSTALL delta;
LOAD delta;

Usage

To scan a local Delta table, run:

SELECT *
FROM delta_scan('file:///some/path/on/local/machine');

Reading from an S3 Bucket

To scan a Delta table in an [S3 bucket]({% link docs/extensions/httpfs/s3api.md %}), run:

SELECT *
FROM delta_scan('s3://some/delta/table');

For authenticating to S3 buckets, DuckDB [Secrets]({% link docs/configuration/secrets_manager.md %}) are supported:

CREATE SECRET (
    TYPE S3,
    PROVIDER CREDENTIAL_CHAIN
);
SELECT *
FROM delta_scan('s3://some/delta/table/with/auth');

To scan public buckets on S3, you may need to pass the correct region by creating a secret containing the region of your public S3 bucket:

CREATE SECRET (
    TYPE S3,
    REGION 'my-region'
);
SELECT *
FROM delta_scan('s3://some/public/table/in/my-region');

Reading from Azure Blob Storage

To scan a Delta table in an [Azure Blob Storage bucket]({% link docs/extensions/azure.md %}#azure-blob-storage), run:

SELECT *
FROM delta_scan('az://my-container/my-table');

For authenticating to Azure Blob Storage, DuckDB [Secrets]({% link docs/configuration/secrets_manager.md %}) are supported:

CREATE SECRET (
    TYPE AZURE,
    PROVIDER CREDENTIAL_CHAIN
);
SELECT *
FROM delta_scan('az://my-container/my-table-with-auth');

Features

While the delta extension is still experimental, many (scanning) features and optimizations are already supported:

  • multithreaded scans and Parquet metadata reading
  • data skipping/filter pushdown
    • skipping row groups in file (based on Parquet metadata)
    • skipping complete files (based on Delta partition information)
  • projection pushdown
  • scanning tables with deletion vectors
  • all primitive types
  • structs
  • S3 support with secrets

More optimizations are going to be released in the future.

Supported DuckDB Versions and Platforms

The delta extension requires DuckDB version 0.10.3 or newer.

The delta extension currently only supports the following platforms:

  • Linux AMD64 (x86_64 and ARM64): linux_amd64, linux_amd64_gcc4, and linux_arm64
  • macOS Intel and Apple Silicon: osx_amd64 and osx_arm64
  • Windows AMD64: windows_amd64

Support for the [other DuckDB platforms]({% link docs/extensions/working_with_extensions.md %}#platforms) is work-in-progress.

layout: docu title: Iceberg Extension github_repository: https://github.com/duckdb/duckdb-iceberg

The iceberg extension is a loadable extension that implements support for the Apache Iceberg format.

Installing and Loading

To install and load the iceberg extension, run:

INSTALL iceberg;
LOAD iceberg;

Updating the Extension

The iceberg extension often receives updates between DuckDB releases. To make sure that you have the latest version, run:

UPDATE EXTENSIONS (iceberg);

Usage

To test the examples, download the iceberg_data.zip file and unzip it.

Common Parameters

Parameter Type Default Description
allow_moved_paths BOOLEAN false Allows scanning Iceberg tables that are moved
metadata_compression_codec VARCHAR '' Treats metadata files as when set to 'gzip'
version VARCHAR '?' Provides an explicit version string, hint file or guessing
version_name_format VARCHAR 'v%s%s.metadata.json,%s%s.metadata.json' Controls how versions are converted to metadata file names

Querying Individual Tables

SELECT count(*)
FROM iceberg_scan('data/iceberg/lineitem_iceberg', allow_moved_paths = true);
count_star()
51793

The allow_moved_paths option ensures that some path resolution is performed, which allows scanning Iceberg tables that are moved.

You can also address specify the current manifest directly in the query, this may be resolved from the catalog prior to the query, in this example the manifest version is a UUID. To do so, navigate to the data/iceberg directory and run:

SELECT count(*)
FROM iceberg_scan('lineitem_iceberg/metadata/v1.metadata.json');
count_star()
60175

The iceberg extension can be paired with the [httpfs extension]({% link docs/extensions/httpfs/overview.md %}) to access Iceberg tables in object stores such as S3.

SELECT count(*)
FROM iceberg_scan(
    's3://bucketname/lineitem_iceberg/metadata/v1.metadata.json',
    allow_moved_paths = true
);

Access Iceberg Metadata

SELECT *
FROM iceberg_metadata('data/iceberg/lineitem_iceberg', allow_moved_paths = true);
manifest_path manifest_sequence_number manifest_content status content file_path file_format record_count
lineitem_iceberg/metadata/10eaca8a-1e1c-421e-ad6d-b232e5ee23d3-m1.avro 2 DATA ADDED EXISTING lineitem_iceberg/data/00041-414-f3c73457-bbd6-4b92-9c15-17b241171b16-00001.parquet PARQUET 51793
lineitem_iceberg/metadata/10eaca8a-1e1c-421e-ad6d-b232e5ee23d3-m0.avro 2 DATA DELETED EXISTING lineitem_iceberg/data/00000-411-0792dcfe-4e25-4ca3-8ada-175286069a47-00001.parquet PARQUET 60175

Visualizing Snapshots

SELECT *
FROM iceberg_snapshots('data/iceberg/lineitem_iceberg');
sequence_number snapshot_id timestamp_ms manifest_list
1 3776207205136740581 2023-02-15 15:07:54.504 lineitem_iceberg/metadata/snap-3776207205136740581-1-cf3d0be5-cf70-453d-ad8f-48fdc412e608.avro
2 7635660646343998149 2023-02-15 15:08:14.73 lineitem_iceberg/metadata/snap-7635660646343998149-1-10eaca8a-1e1c-421e-ad6d-b232e5ee23d3.avro

Selecting Metadata versions

By default, the iceberg extension will look for a version-hint.text file to identify the proper metadata version to use. This can be overridden by explicitly supplying a version number via the version parameter to iceberg table functions. By default, this will look for both v{version}.metadata.json and {version}.metadata.json files, or v{version}.gz.metadata.json and {version}.gz.metadata.json when metadata_compression_codec = 'gzip' is specified. Other compression codecs are not supported.

Additionally, if any .text or .txt file is provided as a version, it is opened and treated as a version-hint file. The iceberg extension will open this file and use the entire contents of the file as a provided version number.

The entire contents of the version-hint.txt file will be treated as a literal version name, with no encoding, escaping or trimming. This includes any whitespace, or unsafe characters which will be explicitly passed formatted into filenames in the logic described below.

SELECT *
FROM iceberg_snapshots(
    'data/iceberg/lineitem_iceberg',
    version = '1',
    allow_moved_paths = true
);
count_star()
60175

Working with Alternative Metadata Naming Conventions

The iceberg extension can handle different metadata naming conventions by specifying them as a comma-delimited list of format strings via the version_name_format parameter. Each format string must take two %s parameters. The first is the location of the version number in the metadata filename and the second is the location of the metadata_compression_codec extension. The behavior described above is provided by the default value of "v%s%s.metadata.gz,%s%smetadata.gz. In the event you had an alternatively named metadata file with such as rev-2.metadata.json.gz, the table could be read via the follow statement.

SELECT *
FROM iceberg_snapshots(
    'data/iceberg/alternative_metadata_gz_naming',
    version = '2',
    version_name_format = 'rev-%s.metadata.json%s',
    metadata_compression_codec = 'gzip',
    allow_moved_paths = true
);
count_star()
60175

“Guessing” Metadata Versions

By default, either a table version number or a version-hint.text must be provided for the iceberg extension to read a table. This is typically provided by an external data catalog. In the event neither is present, the iceberg extension can attempt to guess the latest version by passing ? as the table version. The “latest” version is assumed to be the filename that is lexicographically largest when sorting the filenames. Collations are not considered. This behavior is not enabled by default as it may potentially violate ACID constraints. It can be enabled by setting unsafe_enable_version_guessing to true. When this is set, iceberg functions will attempt to guess the latest version by default before failing.

SET unsafe_enable_version_guessing=true;
SELECT count(*)
FROM iceberg_scan('data/iceberg/lineitem_iceberg_no_hint', allow_moved_paths = true);
-- Or explicitly as:
-- FROM iceberg_scan(
--         'data/iceberg/lineitem_iceberg_no_hint',
--         version = '?',
--         allow_moved_paths = true
-- );
count_star()
51793

Limitations

Writing (i.e., exporting to) Iceberg files is currently not supported.

layout: docu title: PostgreSQL Extension github_repository: https://github.com/duckdb/duckdb-postgres redirect_from:

  • /docs/extensions/postgres_scanner
  • /docs/extensions/postgres_scanner/
  • /docs/extensions/postgresql
  • /docs/extensions/postgresql/

The postgres extension allows DuckDB to directly read and write data from a running PostgreSQL database instance. The data can be queried directly from the underlying PostgreSQL database. Data can be loaded from PostgreSQL tables into DuckDB tables, or vice versa. See the [official announcement]({% post_url 2022-09-30-postgres-scanner %}) for implementation details and background.

Installing and Loading

The postgres extension will be transparently [autoloaded]({% link docs/extensions/overview.md %}#autoloading-extensions) on first use from the official extension repository. If you would like to install and load it manually, run:

INSTALL postgres;
LOAD postgres;

Connecting

To make a PostgreSQL database accessible to DuckDB, use the ATTACH command with the POSTGRES or POSTGRES_SCANNER type.

To connect to the public schema of the PostgreSQL instance running on localhost in read-write mode, run:

ATTACH '' AS postgres_db (TYPE POSTGRES);

To connect to the PostgreSQL instance with the given parameters in read-only mode, run:

ATTACH 'dbname=postgres user=postgres host=127.0.0.1' AS db (TYPE POSTGRES, READ_ONLY);

By default, all schemas are attached. When working with large instances, it can be useful to only attach a specific schema. This can be accomplished using the SCHEMA command.

ATTACH 'dbname=postgres user=postgres host=127.0.0.1' AS db (TYPE POSTGRES, SCHEMA 'public');

Configuration

The ATTACH command takes as input either a libpq connection string or a PostgreSQL URI.

Below are some example connection strings and commonly used parameters. A full list of available parameters can be found in the PostgreSQL documentation.

dbname=postgresscanner
host=localhost port=5432 dbname=mydb connect_timeout=10
Name Description Default
dbname Database name [user]
host Name of host to connect to localhost
hostaddr Host IP address localhost
passfile Name of file passwords are stored in ~/.pgpass
password PostgreSQL password (empty)
port Port number 5432
user PostgreSQL user name current user

An example URI is postgresql://username@hostname/dbname.

Configuring via Secrets

PostgreSQL connection information can also be specified with secrets. The following syntax can be used to create a secret.

CREATE SECRET (
    TYPE POSTGRES,
    HOST '127.0.0.1',
    PORT 5432,
    DATABASE postgres,
    USER 'postgres',
    PASSWORD ''
);

The information from the secret will be used when ATTACH is called. We can leave the PostgreSQL connection string empty to use all of the information stored in the secret.

ATTACH '' AS postgres_db (TYPE POSTGRES);

We can use the PostgreSQL connection string to override individual options. For example, to connect to a different database while still using the same credentials, we can override only the database name in the following manner.

ATTACH 'dbname=my_other_db' AS postgres_db (TYPE POSTGRES);

By default, created secrets are temporary. Secrets can be persisted using the [CREATE PERSISTENT SECRET command]({% link docs/configuration/secrets_manager.md %}#persistent-secrets). Persistent secrets can be used across sessions.

Managing Multiple Secrets

Named secrets can be used to manage connections to multiple PostgreSQL database instances. Secrets can be given a name upon creation.

CREATE SECRET postgres_secret_one (
    TYPE POSTGRES,
    HOST '127.0.0.1',
    PORT 5432,
    DATABASE postgres,
    USER 'postgres',
    PASSWORD ''
);

The secret can then be explicitly referenced using the SECRET parameter in the ATTACH.

ATTACH '' AS postgres_db_one (TYPE POSTGRES, SECRET postgres_secret_one);

Configuring via Environment Variables

PostgreSQL connection information can also be specified with environment variables. This can be useful in a production environment where the connection information is managed externally and passed in to the environment.

export PGPASSWORD="secret"
export PGHOST=localhost
export PGUSER=owner
export PGDATABASE=mydatabase

Then, to connect, start the duckdb process and run:

ATTACH '' AS p (TYPE POSTGRES);

Usage

The tables in the PostgreSQL database can be read as if they were normal DuckDB tables, but the underlying data is read directly from PostgreSQL at query time.

SHOW ALL TABLES;
name
uuids
SELECT * FROM uuids;
u
6d3d2541-710b-4bde-b3af-4711738636bf
NULL
00000000-0000-0000-0000-000000000001
ffffffff-ffff-ffff-ffff-ffffffffffff

It might be desirable to create a copy of the PostgreSQL databases in DuckDB to prevent the system from re-reading the tables from PostgreSQL continuously, particularly for large tables.

Data can be copied over from PostgreSQL to DuckDB using standard SQL, for example:

CREATE TABLE duckdb_table AS FROM postgres_db.postgres_tbl;

Writing Data to PostgreSQL

In addition to reading data from PostgreSQL, the extension allows you to create tables, ingest data into PostgreSQL and make other modifications to a PostgreSQL database using standard SQL queries.

This allows you to use DuckDB to, for example, export data that is stored in a PostgreSQL database to Parquet, or read data from a Parquet file into PostgreSQL.

Below is a brief example of how to create a new table in PostgreSQL and load data into it.

ATTACH 'dbname=postgresscanner' AS postgres_db (TYPE POSTGRES);
CREATE TABLE postgres_db.tbl (id INTEGER, name VARCHAR);
INSERT INTO postgres_db.tbl VALUES (42, 'DuckDB');

Many operations on PostgreSQL tables are supported. All these operations directly modify the PostgreSQL database, and the result of subsequent operations can then be read using PostgreSQL. Note that if modifications are not desired, ATTACH can be run with the READ_ONLY property which prevents making modifications to the underlying database. For example:

ATTACH 'dbname=postgresscanner' AS postgres_db (TYPE POSTGRES, READ_ONLY);

Below is a list of supported operations.

CREATE TABLE

CREATE TABLE postgres_db.tbl (id INTEGER, name VARCHAR);

INSERT INTO

INSERT INTO postgres_db.tbl VALUES (42, 'DuckDB');

SELECT

SELECT * FROM postgres_db.tbl;
id name
42 DuckDB

COPY

You can copy tables back and forth between PostgreSQL and DuckDB:

COPY postgres_db.tbl TO 'data.parquet';
COPY postgres_db.tbl FROM 'data.parquet';

These copies use PostgreSQL binary wire encoding. DuckDB can also write data using this encoding to a file which you can then load into PostgreSQL using a client of your choosing if you would like to do your own connection management:

COPY 'data.parquet' TO 'pg.bin' WITH (FORMAT POSTGRES_BINARY);

The file produced will be the equivalent of copying the file to PostgreSQL using DuckDB and then dumping it from PostgreSQL using psql or another client:

DuckDB:

COPY postgres_db.tbl FROM 'data.parquet';

PostgreSQL:

\copy tbl TO 'data.bin' WITH (FORMAT BINARY);

You may also create a full copy of the database using the [COPY FROM DATABASE statement]({% link docs/sql/statements/copy.md %}#copy-from-database--to):

COPY FROM DATABASE postgres_db TO my_duckdb_db;

UPDATE

UPDATE postgres_db.tbl
SET name = 'Woohoo'
WHERE id = 42;

DELETE

DELETE FROM postgres_db.tbl
WHERE id = 42;

ALTER TABLE

ALTER TABLE postgres_db.tbl
ADD COLUMN k INTEGER;

DROP TABLE

DROP TABLE postgres_db.tbl;

CREATE VIEW

CREATE VIEW postgres_db.v1 AS SELECT 42;

CREATE SCHEMA / DROP SCHEMA

CREATE SCHEMA postgres_db.s1;
CREATE TABLE postgres_db.s1.integers (i INTEGER);
INSERT INTO postgres_db.s1.integers VALUES (42);
SELECT * FROM postgres_db.s1.integers;
i
42
DROP SCHEMA postgres_db.s1;

DETACH

DETACH postgres_db;

Transactions

CREATE TABLE postgres_db.tmp (i INTEGER);
BEGIN;
INSERT INTO postgres_db.tmp VALUES (42);
SELECT * FROM postgres_db.tmp;

This returns:

i
42
ROLLBACK;
SELECT * FROM postgres_db.tmp;

This returns an empty table.

Running SQL Queries in PostgreSQL

The postgres_query Table Function

The postgres_query table function allows you to run arbitrary read queries within an attached database. postgres_query takes the name of the attached PostgreSQL database to execute the query in, as well as the SQL query to execute. The result of the query is returned. Single-quote strings are escaped by repeating the single quote twice.

postgres_query(attached_database::VARCHAR, query::VARCHAR)

For example:

ATTACH 'dbname=postgresscanner' AS postgres_db (TYPE POSTGRES);
SELECT * FROM postgres_query('postgres_db', 'SELECT * FROM cars LIMIT 3');
brand model color
Ferrari Testarossa red
Aston Martin DB2 blue
Bentley Mulsanne gray

The postgres_execute Function

The postgres_execute function allows running arbitrary queries within PostgreSQL, including statements that update the schema and content of the database.

ATTACH 'dbname=postgresscanner' AS postgres_db (TYPE POSTGRES);
CALL postgres_execute('postgres_db', 'CREATE TABLE my_table (i INTEGER)');

Settings

The extension exposes the following configuration parameters.

Name Description Default
pg_array_as_varchar Read PostgreSQL arrays as varchar - enables reading mixed dimensional arrays false
pg_connection_cache Whether or not to use the connection cache true
pg_connection_limit The maximum amount of concurrent PostgreSQL connections 64
pg_debug_show_queries DEBUG SETTING: print all queries sent to PostgreSQL to stdout false
pg_experimental_filter_pushdown Whether or not to use filter pushdown (currently experimental) false
pg_pages_per_task The amount of pages per task 1000
pg_use_binary_copy Whether or not to use BINARY copy to read data true
pg_null_byte_replacement When writing NULL bytes to Postgres, replace them with the given character NULL
pg_use_ctid_scan Whether or not to parallelize scanning using table ctids true

Schema Cache

To avoid having to continuously fetch schema data from PostgreSQL, DuckDB keeps schema information – such as the names of tables, their columns, etc. – cached. If changes are made to the schema through a different connection to the PostgreSQL instance, such as new columns being added to a table, the cached schema information might be outdated. In this case, the function pg_clear_cache can be executed to clear the internal caches.

CALL pg_clear_cache();

Deprecated The old postgres_attach function is deprecated. It is recommended to switch over to the new ATTACH syntax.


layout: docu title: Extensions redirect_from:

  • /docs/extensions
  • /docs/extensions/

Overview

DuckDB has a flexible extension mechanism that allows for dynamically loading extensions. These may extend DuckDB's functionality by providing support for additional file formats, introducing new types, and domain-specific functionality.

Extensions are loadable on all clients (e.g., Python and R). Extensions distributed via the Core and Community repositories are built and tested on macOS, Windows and Linux. All operating systems are supported for both the AMD64 and the ARM64 architectures.

Listing Extensions

To get a list of extensions, use duckdb_extensions:

SELECT extension_name, installed, description
FROM duckdb_extensions();
extension_name installed description
arrow false A zero-copy data integration between Apache Arrow and DuckDB
autocomplete false Adds support for autocomplete in the shell
... ... ...

This list will show which extensions are available, which extensions are installed, at which version, where it is installed, and more. The list includes most, but not all, available core extensions. For the full list, we maintain a [list of core extensions]({% link docs/extensions/core_extensions.md %}).

Built-In Extensions

DuckDB's binary distribution comes standard with a few built-in extensions. They are statically linked into the binary and can be used as is. For example, to use the built-in [json extension]({% link docs/data/json/overview.md %}) to read a JSON file:

SELECT *
FROM 'test.json';

To make the DuckDB distribution lightweight, only a few essential extensions are built-in, varying slightly per distribution. Which extension is built-in on which platform is documented in the [list of core extensions]({% link docs/extensions/core_extensions.md %}#default-extensions).

Installing More Extensions

To make an extension that is not built-in available in DuckDB, two steps need to happen:

  1. Extension installation is the process of downloading the extension binary and verifying its metadata. During installation, DuckDB stores the downloaded extension and some metadata in a local directory. From this directory DuckDB can then load the Extension whenever it needs to. This means that installation needs to happen only once.

  2. Extension loading is the process of dynamically loading the binary into a DuckDB instance. DuckDB will search the local extension directory for the installed extension, then load it to make its features available. This means that every time DuckDB is restarted, all extensions that are used need to be (re)loaded

Extension installation and loading are subject to a few [limitations]({% link docs/extensions/working_with_extensions.md %}#limitations).

There are two main methods of making DuckDB perform the installation and loading steps for an installable extension: explicitly and through autoloading.

Explicit INSTALL and LOAD

In DuckDB extensions can also be explicitly installed and loaded. Both non-autoloadable and autoloadable extensions can be installed this way. To explicitly install and load an extension, DuckDB has the dedicated SQL statements LOAD and INSTALL. For example, to install and load the [spatial extension]({% link docs/extensions/spatial/overview.md %}), run:

INSTALL spatial;
LOAD spatial;

With these statements, DuckDB will ensure the spatial extension is installed (ignoring the INSTALL statement if it is already installed), then proceed to LOAD the spatial extension (again ignoring the statement if it is already loaded).

Extension Repository

Optionally a repository can be provided where the extension should be installed from, by appending FROM ⟨repository⟩ to the INSTALL / FORCE INSTALL command. This repository can either be an alias, such as [community]({% link community_extensions/index.md %}), or it can be a direct URL, provided as a single-quoted string.

After installing/loading an extension, the duckdb_extensions function can be used to get more information.

Autoloading Extensions

For many of DuckDB's core extensions, explicitly loading and installing extensions is not necessary. DuckDB contains an autoloading mechanism which can install and load the core extensions as soon as they are used in a query. For example, when running:

SELECT *
FROM 'https://raw.githubusercontent.com/duckdb/duckdb-web/main/data/weather.csv';

DuckDB will automatically install and load the [httpfs]({% link docs/extensions/httpfs/overview.md %}) extension. No explicit INSTALL or LOAD statements are required.

Not all extensions can be autoloaded. This can have various reasons: some extensions make several changes to the running DuckDB instance, making autoloading technically not (yet) possible. For others, it is preferred to have users opt-in to the extension explicitly before use due to the way they modify behavior in DuckDB.

To see which extensions can be autoloaded, check the [core extensions list]({% link docs/extensions/core_extensions.md %}).

Community Extensions

DuckDB supports installing third-party [Community Extensions]({% link community_extensions/index.md %}). These are contributed by community members but they are built, signed, and distributed in a centralized repository.

Installing Extensions through Client APIs

For many clients, using SQL to load and install extensions is the preferred method. However, some clients have a dedicated API to install and load extensions. For example the [Python API client]({% link docs/clients/python/overview.md %}#loading-and-installing-extensions), which has dedicated install_extension(name: str) and load_extension(name: str) methods. For more details on a specific Client API, refer to the [Client API docs]({% link docs/clients/overview.md %})

Updating Extensions

While built-in extensions are tied to a DuckDB release due to their nature of being built into the DuckDB binary, installable extensions can and do receive updates. To ensure all currently installed extensions are on the most recent version, call:

UPDATE EXTENSIONS;

For more details on extension version refer to [Extension Versioning]({% link docs/extensions/versioning_of_extensions.md %}).

Installation Location

By default, extensions are installed under the user's home directory:

~/.duckdb/extensions/⟨duckdb_version⟩/⟨platform_name⟩/

For stable DuckDB releases, the ⟨duckdb_version⟩ will be equal to the version tag of that release. For nightly DuckDB builds, it will be equal to the short git hash of the build. So for example, the extensions for DuckDB version v0.10.3 on macOS ARM64 (Apple Silicon) are installed to ~/.duckdb/extensions/v0.10.3/osx_arm64/. An example installation path for a nightly DuckDB build could be ~/.duckdb/extensions/fc2e4b26a6/linux_amd64_gcc4.

To change the default location where DuckDB stores its extensions, use the extension_directory configuration option:

SET extension_directory = '/path/to/your/extension/directory';

Note that setting the value of the home_directory configuration option has no effect on the location of the extensions.

Binary Compatibility

To avoid binary compatibility issues, the binary extensions distributed by DuckDB are tied both to a specific DuckDB version and a platform. This means that DuckDB can automatically detect binary compatibility between it and a loadable extension. When trying to load an extension that was compiled for a different version or platform, DuckDB will throw an error and refuse to load the extension.

See the [Working with Extensions page]({% link docs/extensions/working_with_extensions.md %}#platforms) for details on available platforms.

Developing Extensions

The same API that the core extensions use is available for developing extensions. This allows users to extend the functionality of DuckDB such that it suits their domain the best. A template for creating extensions is available in the extension-template repository. This template also holds some documentation on how to get started building your own extension.

Extension Signing

Extensions are signed with a cryptographic key, which also simplifies distribution (this is why they are served over HTTP and not HTTPS). By default, DuckDB uses its built-in public keys to verify the integrity of extension before loading them. All extensions provided by the DuckDB core team are signed.

Unsigned Extensions

Warning Only load unsigned extensions from sources you trust. Avoid loading unsigned extensions over HTTP. Consult the [Securing DuckDB page]({% link docs/operations_manual/securing_duckdb/securing_extensions.md %}) for guidelines on how set up DuckDB in a secure manner.

If you wish to load your own extensions or extensions from third-parties you will need to enable the allow_unsigned_extensions flag. To load unsigned extensions using the [CLI client]({% link docs/clients/cli/overview.md %}), pass the -unsigned flag to it on startup:

duckdb -unsigned

Now any extension can be loaded, signed or not:

LOAD './some/local/ext.duckdb_extension';

For client APIs, the allow_unsigned_extensions database configuration options needs to be set, see the respective [Client API docs]({% link docs/clients/overview.md %}). For example, for the Python client, see the [Loading and Installing Extensions section in the Python API documentation]({% link docs/clients/python/overview.md %}#loading-and-installing-extensions).

Working with Extensions

For advanced installation instructions and more details on extensions, see the [Working with Extensions page]({% link docs/extensions/working_with_extensions.md %}).

layout: docu title: MySQL Extension github_repository: https://github.com/duckdb/duckdb-mysql

The mysql extension allows DuckDB to directly read and write data from/to a running MySQL instance. The data can be queried directly from the underlying MySQL database. Data can be loaded from MySQL tables into DuckDB tables, or vice versa.

Installing and Loading

To install the mysql extension, run:

INSTALL mysql;

The extension is loaded automatically upon first use. If you prefer to load it manually, run:

LOAD mysql;

Reading Data from MySQL

To make a MySQL database accessible to DuckDB use the ATTACH command with the MYSQL or the MYSQL_SCANNER type:

ATTACH 'host=localhost user=root port=0 database=mysql' AS mysqldb (TYPE MYSQL);
USE mysqldb;

Configuration

The connection string determines the parameters for how to connect to MySQL as a set of key=value pairs. Any options not provided are replaced by their default values, as per the table below. Connection information can also be specified with environment variables. If no option is provided explicitly, the MySQL extension tries to read it from an environment variable.

Setting Default Environment variable
database NULL MYSQL_DATABASE
host localhost MYSQL_HOST
password MYSQL_PWD
port 0 MYSQL_TCP_PORT
socket NULL MYSQL_UNIX_PORT
user ⟨current user⟩ MYSQL_USER
ssl_mode preferred
ssl_ca
ssl_capath
ssl_cert
ssl_cipher
ssl_crl
ssl_crlpath
ssl_key

Configuring via Secrets

MySQL connection information can also be specified with secrets. The following syntax can be used to create a secret.

CREATE SECRET (
    TYPE MYSQL,
    HOST '127.0.0.1',
    PORT 0,
    DATABASE mysql,
    USER 'mysql',
    PASSWORD ''
);

The information from the secret will be used when ATTACH is called. We can leave the connection string empty to use all of the information stored in the secret.

ATTACH '' AS mysql_db (TYPE MYSQL);

We can use the connection string to override individual options. For example, to connect to a different database while still using the same credentials, we can override only the database name in the following manner.

ATTACH 'database=my_other_db' AS mysql_db (TYPE MYSQL);

By default, created secrets are temporary. Secrets can be persisted using the [CREATE PERSISTENT SECRET command]({% link docs/configuration/secrets_manager.md %}#persistent-secrets). Persistent secrets can be used across sessions.

Managing Multiple Secrets

Named secrets can be used to manage connections to multiple MySQL database instances. Secrets can be given a name upon creation.

CREATE SECRET mysql_secret_one (
    TYPE MYSQL,
    HOST '127.0.0.1',
    PORT 0,
    DATABASE mysql,
    USER 'mysql',
    PASSWORD ''
);

The secret can then be explicitly referenced using the SECRET parameter in the ATTACH.

ATTACH '' AS mysql_db_one (TYPE MYSQL, SECRET mysql_secret_one);

SSL Connections

The ssl connection parameters can be used to make SSL connections. Below is a description of the supported parameters.

Setting Description
ssl_mode The security state to use for the connection to the server: disabled, required, verify_ca, verify_identity or preferred (default: preferred)
ssl_ca The path name of the Certificate Authority (CA) certificate file
ssl_capath The path name of the directory that contains trusted SSL CA certificate files
ssl_cert The path name of the client public key certificate file
ssl_cipher The list of permissible ciphers for SSL encryption
ssl_crl The path name of the file containing certificate revocation lists
ssl_crlpath The path name of the directory that contains files containing certificate revocation lists
ssl_key The path name of the client private key file

Reading MySQL Tables

The tables in the MySQL database can be read as if they were normal DuckDB tables, but the underlying data is read directly from MySQL at query time.

SHOW ALL TABLES;
name
signed_integers
SELECT * FROM signed_integers;
t s m i b
-128 -32768 -8388608 -2147483648 -9223372036854775808
127 32767 8388607 2147483647 9223372036854775807
NULL NULL NULL NULL NULL

It might be desirable to create a copy of the MySQL databases in DuckDB to prevent the system from re-reading the tables from MySQL continuously, particularly for large tables.

Data can be copied over from MySQL to DuckDB using standard SQL, for example:

CREATE TABLE duckdb_table AS FROM mysqlscanner.mysql_table;

Writing Data to MySQL

In addition to reading data from MySQL, create tables, ingest data into MySQL and make other modifications to a MySQL database using standard SQL queries.

This allows you to use DuckDB to, for example, export data that is stored in a MySQL database to Parquet, or read data from a Parquet file into MySQL.

Below is a brief example of how to create a new table in MySQL and load data into it.

ATTACH 'host=localhost user=root port=0 database=mysqlscanner' AS mysql_db (TYPE MYSQL);
CREATE TABLE mysql_db.tbl (id INTEGER, name VARCHAR);
INSERT INTO mysql_db.tbl VALUES (42, 'DuckDB');

Many operations on MySQL tables are supported. All these operations directly modify the MySQL database, and the result of subsequent operations can then be read using MySQL. Note that if modifications are not desired, ATTACH can be run with the READ_ONLY property which prevents making modifications to the underlying database. For example:

ATTACH 'host=localhost user=root port=0 database=mysqlscanner' AS mysql_db (TYPE MYSQL, READ_ONLY);

Supported Operations

Below is a list of supported operations.

CREATE TABLE

CREATE TABLE mysql_db.tbl (id INTEGER, name VARCHAR);

INSERT INTO

INSERT INTO mysql_db.tbl VALUES (42, 'DuckDB');

SELECT

SELECT * FROM mysql_db.tbl;
id name
42 DuckDB

COPY

COPY mysql_db.tbl TO 'data.parquet';
COPY mysql_db.tbl FROM 'data.parquet';

You may also create a full copy of the database using the [COPY FROM DATABASE statement]({% link docs/sql/statements/copy.md %}#copy-from-database--to):

COPY FROM DATABASE mysql_db TO my_duckdb_db;

UPDATE

UPDATE mysql_db.tbl
SET name = 'Woohoo'
WHERE id = 42;

DELETE

DELETE FROM mysql_db.tbl
WHERE id = 42;

ALTER TABLE

ALTER TABLE mysql_db.tbl
ADD COLUMN k INTEGER;

DROP TABLE

DROP TABLE mysql_db.tbl;

CREATE VIEW

CREATE VIEW mysql_db.v1 AS SELECT 42;

CREATE SCHEMA and DROP SCHEMA

CREATE SCHEMA mysql_db.s1;
CREATE TABLE mysql_db.s1.integers (i INTEGER);
INSERT INTO mysql_db.s1.integers VALUES (42);
SELECT * FROM mysql_db.s1.integers;
i
42
DROP SCHEMA mysql_db.s1;

Transactions

CREATE TABLE mysql_db.tmp (i INTEGER);
BEGIN;
INSERT INTO mysql_db.tmp VALUES (42);
SELECT * FROM mysql_db.tmp;

This returns:

i
42
ROLLBACK;
SELECT * FROM mysql_db.tmp;

This returns an empty table.

The DDL statements are not transactional in MySQL.

Running SQL Queries in MySQL

The mysql_query Table Function

The mysql_query table function allows you to run arbitrary read queries within an attached database. mysql_query takes the name of the attached MySQL database to execute the query in, as well as the SQL query to execute. The result of the query is returned. Single-quote strings are escaped by repeating the single quote twice.

mysql_query(attached_database::VARCHAR, query::VARCHAR)

For example:

ATTACH 'host=localhost database=mysql' AS mysqldb (TYPE MYSQL);
SELECT * FROM mysql_query('mysqldb', 'SELECT * FROM cars LIMIT 3');

The mysql_execute Function

The mysql_execute function allows running arbitrary queries within MySQL, including statements that update the schema and content of the database.

ATTACH 'host=localhost database=mysql' AS mysqldb (TYPE MYSQL);
CALL mysql_execute('mysqldb', 'CREATE TABLE my_table (i INTEGER)');

Settings

Name Description Default
mysql_bit1_as_boolean Whether or not to convert BIT(1) columns to BOOLEAN true
mysql_debug_show_queries DEBUG SETTING: print all queries sent to MySQL to stdout false
mysql_experimental_filter_pushdown Whether or not to use filter pushdown (currently experimental) false
mysql_tinyint1_as_boolean Whether or not to convert TINYINT(1) columns to BOOLEAN true

Schema Cache

To avoid having to continuously fetch schema data from MySQL, DuckDB keeps schema information – such as the names of tables, their columns, etc. – cached. If changes are made to the schema through a different connection to the MySQL instance, such as new columns being added to a table, the cached schema information might be outdated. In this case, the function mysql_clear_cache can be executed to clear the internal caches.

CALL mysql_clear_cache();

layout: docu title: Vector Similarity Search Extension github_repository: https://github.com/duckdb/duckdb-vss

The vss extension is an experimental extension for DuckDB that adds indexing support to accelerate vector similarity search queries using DuckDB's new fixed-size ARRAY type.

See the [announcement blog post]({% post_url 2024-05-03-vector-similarity-search-vss %}) and the [“What's New in the Vector Similarity Search Extension?” post]({% post_url 2024-10-23-whats-new-in-the-vss-extension %}).

Usage

To create a new HNSW (Hierarchical Navigable Small Worlds) index on a table with an ARRAY column, use the CREATE INDEX statement with the USING HNSW clause. For example:

INSTALL vss;
LOAD vss;

CREATE TABLE my_vector_table (vec FLOAT[3]);
INSERT INTO my_vector_table
    SELECT array_value(a, b, c)
    FROM range(1, 10) ra(a), range(1, 10) rb(b), range(1, 10) rc(c);
CREATE INDEX my_hnsw_index ON my_vector_table USING HNSW (vec);

The index will then be used to accelerate queries that use a ORDER BY clause evaluating one of the supported distance metric functions against the indexed columns and a constant vector, followed by a LIMIT clause. For example:

SELECT *
FROM my_vector_table
ORDER BY array_distance(vec, [1, 2, 3]::FLOAT[3])
LIMIT 3;

Additionally, the overloaded min_by(col, arg, n) can also be accelerated with the HNSW index if the arg argument is a matching distance metric function. This can be used to do quick one-shot nearest neighbor searches. For example, to get the top 3 rows with the closest vectors to [1, 2, 3]:

SELECT min_by(my_vector_table, array_distance(vec, [1, 2, 3]::FLOAT[3]), 3) AS result
FROM my_vector_table;
---- [{'vec': [1.0, 2.0, 3.0]}, {'vec': [1.0, 2.0, 4.0]}, {'vec': [2.0, 2.0, 3.0]}]

Note how we pass the table name as the first argument to min_by to return a struct containing the entire matched row.

We can verify that the index is being used by checking the EXPLAIN output and looking for the HNSW_INDEX_SCAN node in the plan:

EXPLAIN
SELECT *
FROM my_vector_table
ORDER BY array_distance(vec, [1, 2, 3]::FLOAT[3])
LIMIT 3;
┌───────────────────────────┐
│         PROJECTION        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             #0            │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         PROJECTION        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│            vec            │
│array_distance(vec, [1.0, 2│
│         .0, 3.0])         │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│      HNSW_INDEX_SCAN      │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│   t1 (HNSW INDEX SCAN :   │
│           my_idx)         │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│            vec            │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           EC: 3           │
└───────────────────────────┘

By default the HNSW index will be created using the euclidean distance l2sq (L2-norm squared) metric, matching DuckDBs array_distance function, but other distance metrics can be used by specifying the metric option during index creation. For example:

CREATE INDEX my_hnsw_cosine_index
ON my_vector_table
USING HNSW (vec)
WITH (metric = 'cosine');

The following table shows the supported distance metrics and their corresponding DuckDB functions

Metric Function Description
l2sq array_distance Euclidean distance
cosine array_cosine_distance Cosine similarity distance
ip array_negative_inner_product Negative inner product

Note that while each HNSW index only applies to a single column you can create multiple HNSW indexes on the same table each individually indexing a different column. Additionally, you can also create multiple HNSW indexes to the same column, each supporting a different distance metric.

Index Options

Besides the metric option, the HNSW index creation statement also supports the following options to control the hyperparameters of the index construction and search process:

Option Default Description
ef_construction 128 The number of candidate vertices to consider during the construction of the index. A higher value will result in a more accurate index, but will also increase the time it takes to build the index.
ef_search 64 The number of candidate vertices to consider during the search phase of the index. A higher value will result in a more accurate index, but will also increase the time it takes to perform a search.
M 16 The maximum number of neighbors to keep for each vertex in the graph. A higher value will result in a more accurate index, but will also increase the time it takes to build the index.
M0 2 * M The base connectivity, or the number of neighbors to keep for each vertex in the zero-th level of the graph. A higher value will result in a more accurate index, but will also increase the time it takes to build the index.

Additionally, you can also override the ef_search parameter set at index construction time by setting the SET hnsw_ef_search = ⟨int⟩ configuration option at runtime. This can be useful if you want to trade search performance for accuracy or vice-versa on a per-connection basis. You can also unset the override by calling RESET hnsw_ef_search.

Persistence

Due to some known issues related to peristence of custom extension indexes, the HNSW index can only be created on tables in in-memory databases by default, unless the SET hnsw_enable_experimental_persistence = ⟨bool⟩ configuration option is set to true.

The reasoning for locking this feature behind an experimental flag is that “WAL” recovery is not yet properly implemented for custom indexes, meaning that if a crash occurs or the database is shut down unexpectedly while there are uncommitted changes to a HNSW-indexed table, you can end up with data loss or corruption of the index.

If you enable this option and experience an unexpected shutdown, you can try to recover the index by first starting DuckDB separately, loading the vss extension and then ATTACHing the database file, which ensures that the HNSW index functionality is available during WAL-playback, allowing DuckDB's recovery process to proceed without issues. But we still recommend that you do not use this feature in production environments.

With the hnsw_enable_experimental_persistence option enabled, the index will be persisted into the DuckDB database file (if you run DuckDB with a disk-backed database file), which means that after a database restart, the index can be loaded back into memory from disk instead of having to be re-created. With that in mind, there are no incremental updates to persistent index storage, so every time DuckDB performs a checkpoint the entire index will be serialized to disk and overwrite itself. Similarly, after a restart of the database, the index will be deserialized back into main memory in its entirety. Although this will be deferred until you first access the table associated with the index. Depending on how large the index is, the deserialization process may take some time, but it should still be faster than simply dropping and re-creating the index.

Inserts, Updates, Deletes and Re-Compaction

The HNSW index does support inserting, updating and deleting rows from the table after index creation. However, there are two things to keep in mind:

  • It's faster to create the index after the table has been populated with data as the initial bulk load can make better use of parallelism on large tables.
  • Deletes are not immediately reflected in the index, but are instead “marked” as deleted, which can cause the index to grow stale over time and negatively impact query quality and performance.

To remedy the last point, you can call the PRAGMA hnsw_compact_index('⟨index name⟩') pragma function to trigger a re-compaction of the index pruning deleted items, or re-create the index after a significant number of updates.

Bonus: Vector Similarity Search Joins

The vss extension also provides a couple of table macros to simplify matching multiple vectors against eachother, so called "fuzzy joins". These are:

  • vss_join(left_table, right_table, left_col, right_col, k, metric := 'l2sq')
  • vss_match(right_table", left_col, right_col, k, metric := 'l2sq')

These do not currently make use of the HNSW index but are provided as convenience utility functions for users who are ok with performing brute-force vector similarity searches without having to write out the join logic themselves. In the future these might become targets for index-based optimizations as well.

These functions can be used as follows:

CREATE TABLE haystack (id int, vec FLOAT[3]);
CREATE TABLE needle (search_vec FLOAT[3]);

INSERT INTO haystack
    SELECT row_number() OVER (), array_value(a,b,c)
    FROM range(1, 10) ra(a), range(1, 10) rb(b), range(1, 10) rc(c);

INSERT INTO needle
    VALUES ([5, 5, 5]), ([1, 1, 1]);

SELECT *
FROM vss_join(needle, haystack, search_vec, vec, 3) res;
┌───────┬─────────────────────────────────┬─────────────────────────────────────┐
│ score │            left_tbl             │              right_tbl              │
│ float │   struct(search_vec float[3])   │  struct(id integer, vec float[3])   │
├───────┼─────────────────────────────────┼─────────────────────────────────────┤
│   0.0 │ {'search_vec': [5.0, 5.0, 5.0]} │ {'id': 365, 'vec': [5.0, 5.0, 5.0]} │
│   1.0 │ {'search_vec': [5.0, 5.0, 5.0]} │ {'id': 364, 'vec': [5.0, 4.0, 5.0]} │
│   1.0 │ {'search_vec': [5.0, 5.0, 5.0]} │ {'id': 356, 'vec': [4.0, 5.0, 5.0]} │
│   0.0 │ {'search_vec': [1.0, 1.0, 1.0]} │ {'id': 1, 'vec': [1.0, 1.0, 1.0]}   │
│   1.0 │ {'search_vec': [1.0, 1.0, 1.0]} │ {'id': 10, 'vec': [2.0, 1.0, 1.0]}  │
│   1.0 │ {'search_vec': [1.0, 1.0, 1.0]} │ {'id': 2, 'vec': [1.0, 2.0, 1.0]}   │
└───────┴─────────────────────────────────┴─────────────────────────────────────┘
-- Alternatively, we can use the vss_match macro as a "lateral join"
-- to get the matches already grouped by the left table.
-- Note that this requires us to specify the left table first, and then
-- the vss_match macro which references the search column from the left
-- table (in this case, `search_vec`).
SELECT *
FROM needle, vss_match(haystack, search_vec, vec, 3) res;
┌─────────────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│   search_vec    │                                                                                       matches                                                                                        │
│    float[3]     │                                                            struct(score float, "row" struct(id integer, vec float[3]))[]                                                             │
├─────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ [5.0, 5.0, 5.0] │ [{'score': 0.0, 'row': {'id': 365, 'vec': [5.0, 5.0, 5.0]}}, {'score': 1.0, 'row': {'id': 364, 'vec': [5.0, 4.0, 5.0]}}, {'score': 1.0, 'row': {'id': 356, 'vec': [4.0, 5.0, 5.0]}}] │
│ [1.0, 1.0, 1.0] │ [{'score': 0.0, 'row': {'id': 1, 'vec': [1.0, 1.0, 1.0]}}, {'score': 1.0, 'row': {'id': 10, 'vec': [2.0, 1.0, 1.0]}}, {'score': 1.0, 'row': {'id': 2, 'vec': [1.0, 2.0, 1.0]}}]      │
└─────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Limitations

  • Only vectors consisting of FLOATs (32-bit, single precision) are supported at the moment.
  • The index itself is not buffer managed and must be able to fit into RAM memory.
  • The size of the index in memory does not count towards DuckDB's memory_limit configuration parameter.
  • HNSW indexes can only be created on tables in in-memory databases, unless the SET hnsw_enable_experimental_persistence = ⟨bool⟩ configuration option is set to true, see Persistence for more information.
  • The vector join table macros (vss_join and vss_match) do not require or make use of the HNSW index.

layout: docu title: AutoComplete Extension github_directory: https://github.com/duckdb/duckdb/tree/main/extension/autocomplete

The autocomplete extension adds supports for autocomplete in the [CLI client]({% link docs/clients/cli/overview.md %}). The extension is shipped by default with the CLI client.

Behavior

For the behavior of the autocomplete extension, see the [documentation of the CLI client]({% link docs/clients/cli/autocomplete.md %}).

Functions

Function Description
sql_auto_complete(query_string) Attempts autocompletion on the given query_string.

Example

SELECT *
FROM sql_auto_complete('SEL');

Returns:

suggestion suggestion_start
SELECT 0
DELETE 0
INSERT 0
CALL 0
LOAD 0
CALL 0
ALTER 0
BEGIN 0
EXPORT 0
CREATE 0
PREPARE 0
EXECUTE 0
EXPLAIN 0
ROLLBACK 0
DESCRIBE 0
SUMMARIZE 0
CHECKPOINT 0
DEALLOCATE 0
UPDATE 0
DROP 0

layout: docu title: Versioning of Extensions

Extension Versioning

Most software has some sort of version number. Version numbers serve a few important goals:

  • Tie a binary to a specific state of the source code
  • Allow determining the expected feature set
  • Allow determining the state of the APIs
  • Allow efficient processing of bug reports (e.g., bug #1337 was introduced in version v3.4.5 )
  • Allow determining chronological order of releases (e.g., version v1.2.3 is older than v1.2.4)
  • Give an indication of expected stability (e.g., v0.0.1 is likely not very stable, whereas v13.11.0 probably is stable)

Just like [DuckDB itself]({% link docs/dev/release_calendar.md %}), DuckDB extensions have their own version number. To ensure consistent semantics of these version numbers across the various extensions, DuckDB's [Core Extensions]({% link docs/extensions/core_extensions.md %}) use a versioning scheme that prescribes how extensions should be versioned. The versioning scheme for Core Extensions is made up of 3 different stability levels: unstable, pre-release, and stable. Let's go over each of the 3 levels and describe their format:

Unstable Extensions

Unstable extensions are extensions that can't (or don't want to) give any guarantees regarding their current stability, or their goals of becoming stable. Unstable extensions are tagged with the short git hash of the extension.

For example, at the time of writing this, the version of the vss extension is an unstable extension of version 690bfc5.

What to expect from an extension that has a version number in the unstable format?

  • The state of the source code of the extension can be found by looking up the hash in the extension repository
  • Functionality may change or be removed completely with every release
  • This extension's API could change with every release
  • This extension may not follow a structured release cycle, new (breaking) versions can be pushed at any time

Pre-Release Extensions

Pre-release extensions are the next step up from Unstable extensions. They are tagged with version in the SemVer format, more specifically, those in the v0.y.z format. In semantic versioning, versions starting with v0 have a special meaning: they indicate that the more strict semantics of regular (>v1.0.0) versions do not yet apply. It basically means that an extensions is working towards becoming a stable extension, but is not quite there yet.

For example, at the time of writing this, the version of the delta extension is a pre-release extension of version v0.1.0.

What to expect from an extension that has a version number in the pre-release format?

  • The extension is compiled from the source code corresponding to the tag.
  • Semantic Versioning semantics apply. See the Semantic Versioning specification for details.
  • The extension follows a release cycle where new features are tested in nightly builds before being grouped into a release and pushed to the core repository.
  • Release notes describing what has been added each release should be available to make it easy to understand the difference between versions.

Stable Extensions

Stable extensions are the final step of extension stability. This is denoted by using a stable SemVer of format vx.y.z where x>0.

For example, at the time of writing this, the version of the parquet extension is a stable extension of version v1.0.0.

What to expect from an extension that has a version number in the stable format? Essentially the same as pre-release extensions, but now the more strict SemVer semantics apply: the API of the extension should now be stable and will only change in backwards incompatible ways when the major version is bumped. See the SemVer specification for details

Release Cycle of Pre-Release and Stable Core Extensions

In general for extensions the release cycle depends on their stability level. unstable extensions are often in sync with DuckDB's release cycle, but may also be quietly updated between DuckDB releases. pre-release and stable extensions follow their own release cycle. These may or may not coincide with DuckDB releases. To find out more about the release cycle of a specific extension, refer to the documentation or GitHub page of the respective extension. Generally, pre-release and stable extensions will document their releases as GitHub releases, an example of which you can see in the delta extension.

Finally, there is a small exception: All [in-tree]({% link docs/extensions/working_with_extensions.md %}#in-tree-vs-out-of-tree) extensions simply follow DuckDB's release cycle.

Nightly Builds

Just like DuckDB itself, DuckDB's core extensions have nightly or dev builds that can be used to try out features before they are officially released. This can be useful when your workflow depends on a new feature, or when you need to confirm that your stack is compatible with the upcoming version.

Nightly builds for extensions are slightly complicated due to the fact that currently DuckDB extensions binaries are tightly bound to a single DuckDB version. Because of this tight connection, there is a potential risk for a combinatorial explosion. Therefore, not all combinations of nightly extension build and nightly DuckDB build are available.

In general, there are 2 ways of using nightly builds: using a nightly DuckDB build and using a stable DuckDB build. Let's go over the differences between the two:

From Stable DuckDB

In most cases, user's will be interested in a nightly build of a specific extension, but don't necessarily want to switch to using the nightly build of DuckDB itself. This allows using a specific bleeding-edge feature while limiting the exposure to unstable code.

To achieve this, Core Extensions tend to regularly push builds to the [core_nightly repository]({% link docs/extensions/working_with_extensions.md %}#extension-repositories). Let's look at an example:

First we install a [stable DuckDB build]({% link docs/installation/index.html %}).

Then we can install and load a nightly extension like this:

INSTALL aws FROM core_nightly;
LOAD aws;

In this example we are using the latest nightly build of the aws extension with the latest stable version of DuckDB.

From Nightly DuckDB

When DuckDB CI produces a nightly binary of DuckDB itself, the binaries are distributed with a set of extensions that are pinned at a specific version. This extension version will be tested for that specific build of DuckDB, but might not be the latest dev build. Let's look at an example:

First, we install a [nightly DuckDB build]({% link docs/installation/index.html %}). Then, we can install and load the aws extension as expected:

INSTALL aws;
LOAD aws;

Updating Extensions

DuckDB has a dedicated statement that will automatically update all extensions to their latest version. The output will give the user information on which extensions were updated to/from which version. For example:

UPDATE EXTENSIONS;
extension_name repository update_result previous_version current_version
httpfs core NO_UPDATE_AVAILABLE 70fd6a8a24 70fd6a8a24
delta core UPDATED d9e5cc1 04c61e4
azure core NO_UPDATE_AVAILABLE 49b63dc 49b63dc
aws core_nightly NO_UPDATE_AVAILABLE 42c78d3 42c78d3

Note that DuckDB will look for updates in the source repository for each extension. So if an extension was installed from core_nightly, it will be updated with the latest nightly build.

The update statement can also be provided with a list of specific extensions to update:

UPDATE EXTENSIONS (httpfs, azure);
extension_name repository update_result previous_version current_version
httpfs core NO_UPDATE_AVAILABLE 70fd6a8a24 70fd6a8a24
azure core NO_UPDATE_AVAILABLE 49b63dc 49b63dc

Target DuckDB Version

Currently, when extensions are compiled, they are tied to a specific version of DuckDB. What this means is that, for example, an extension binary compiled for v0.10.3 does not work for v1.0.0. In most cases, this will not cause any issues and is fully transparent; DuckDB will automatically ensure it installs the correct binary for its version. For extension developers, this means that they must ensure that new binaries are created whenever a new version of DuckDB is released. However, note that DuckDB provides an extension template that makes this fairly simple.

layout: docu title: ICU Extension github_directory: https://github.com/duckdb/duckdb/tree/main/extension/icu

The icu extension contains an easy-to-use version of the collation/timezone part of the ICU library.

Installing and Loading

The icu extension will be transparently [autoloaded]({% link docs/extensions/overview.md %}#autoloading-extensions) on first use from the official extension repository. If you would like to install and load it manually, run:

INSTALL icu;
LOAD icu;

Features

The icu extension introduces the following features:

  • [Region-dependent collations]({% link docs/sql/expressions/collations.md %})
  • [Time zones]({% link docs/sql/data_types/timezones.md %}), used for [timestamp data types]({% link docs/sql/data_types/timestamp.md %}) and [timestamp functions]({% link docs/sql/functions/timestamptz.md %})

layout: docu title: Community Extensions

Community-contributed extensions can be installed from the Community Extensions repository since [summer 2024]({% post_url 2024-07-05-community-extensions %}). Please visit the [Community Extensions section]({% link community_extensions/index.md %}) of the documentation for more details.

layout: docu title: Full-Text Search Extension github_repository: https://github.com/duckdb/duckdb-fts

Full-Text Search is an extension to DuckDB that allows for search through strings, similar to SQLite's FTS5 extension.

Installing and Loading

The fts extension will be transparently [autoloaded]({% link docs/extensions/overview.md %}#autoloading-extensions) on first use from the official extension repository. If you would like to install and load it manually, run:

INSTALL fts;
LOAD fts;

Usage

The extension adds two PRAGMA statements to DuckDB: one to create, and one to drop an index. Additionally, a scalar macro stem is added, which is used internally by the extension.

PRAGMA create_fts_index

create_fts_index(input_table, input_id, *input_values, stemmer = 'porter',
                 stopwords = 'english', ignore = '(\\.|[^a-z])+',
                 strip_accents = 1, lower = 1, overwrite = 0)

PRAGMA that creates a FTS index for the specified table.

Name Type Description
input_table VARCHAR Qualified name of specified table, e.g., 'table_name' or 'main.table_name'
input_id VARCHAR Column name of document identifier, e.g., 'document_identifier'
input_values… VARCHAR Column names of the text fields to be indexed (vararg), e.g., 'text_field_1', 'text_field_2', ..., 'text_field_N', or '\*' for all columns in input_table of type VARCHAR
stemmer VARCHAR The type of stemmer to be used. One of 'arabic', 'basque', 'catalan', 'danish', 'dutch', 'english', 'finnish', 'french', 'german', 'greek', 'hindi', 'hungarian', 'indonesian', 'irish', 'italian', 'lithuanian', 'nepali', 'norwegian', 'porter', 'portuguese', 'romanian', 'russian', 'serbian', 'spanish', 'swedish', 'tamil', 'turkish', or 'none' if no stemming is to be used. Defaults to 'porter'
stopwords VARCHAR Qualified name of table containing a single VARCHAR column containing the desired stopwords, or 'none' if no stopwords are to be used. Defaults to 'english' for a pre-defined list of 571 English stopwords
ignore VARCHAR Regular expression of patterns to be ignored. Defaults to `'(\.
strip_accents BOOLEAN Whether to remove accents (e.g., convert á to a). Defaults to 1
lower BOOLEAN Whether to convert all text to lowercase. Defaults to 1
overwrite BOOLEAN Whether to overwrite an existing index on a table. Defaults to 0

This PRAGMA builds the index under a newly created schema. The schema will be named after the input table: if an index is created on table 'main.table_name', then the schema will be named 'fts_main_table_name'.

PRAGMA drop_fts_index

drop_fts_index(input_table)

Drops a FTS index for the specified table.

Name Type Description
input_table VARCHAR Qualified name of input table, e.g., 'table_name' or 'main.table_name'

match_bm25 Function

match_bm25(input_id, query_string, fields := NULL, k := 1.2, b := 0.75, conjunctive := 0)

When an index is built, this retrieval macro is created that can be used to search the index.

Name Type Description
input_id VARCHAR Column name of document identifier, e.g., 'document_identifier'
query_string VARCHAR The string to search the index for
fields VARCHAR Comma-separarated list of fields to search in, e.g., 'text_field_2, text_field_N'. Defaults to NULL to search all indexed fields
k DOUBLE Parameter k1 in the Okapi BM25 retrieval model. Defaults to 1.2
b DOUBLE Parameter b in the Okapi BM25 retrieval model. Defaults to 0.75
conjunctive BOOLEAN Whether to make the query conjunctive i.e., all terms in the query string must be present in order for a document to be retrieved

stem Function

stem(input_string, stemmer)

Reduces words to their base. Used internally by the extension.

Name Type Description
input_string VARCHAR The column or constant to be stemmed.
stemmer VARCHAR The type of stemmer to be used. One of 'arabic', 'basque', 'catalan', 'danish', 'dutch', 'english', 'finnish', 'french', 'german', 'greek', 'hindi', 'hungarian', 'indonesian', 'irish', 'italian', 'lithuanian', 'nepali', 'norwegian', 'porter', 'portuguese', 'romanian', 'russian', 'serbian', 'spanish', 'swedish', 'tamil', 'turkish', or 'none' if no stemming is to be used.

Example Usage

Create a table and fill it with text data:

CREATE TABLE documents (
    document_identifier VARCHAR,
    text_content VARCHAR,
    author VARCHAR,
    doc_version INTEGER
);
INSERT INTO documents
    VALUES ('doc1',
            'The mallard is a dabbling duck that breeds throughout the temperate.',
            'Hannes Mühleisen',
            3),
           ('doc2',
            'The cat is a domestic species of small carnivorous mammal.',
            'Laurens Kuiper',
            2
           );

Build the index, and make both the text_content and author columns searchable.

PRAGMA create_fts_index(
    'documents', 'document_identifier', 'text_content', 'author'
);

Search the author field index for documents that are authored by Muhleisen. This retrieves doc1:

SELECT document_identifier, text_content, score
FROM (
    SELECT *, fts_main_documents.match_bm25(
        document_identifier,
        'Muhleisen',
        fields := 'author'
    ) AS score
    FROM documents
) sq
WHERE score IS NOT NULL
  AND doc_version > 2
ORDER BY score DESC;
document_identifier text_content score
doc1 The mallard is a dabbling duck that breeds throughout the temperate. 0.0

Search for documents about small cats. This retrieves doc2:

SELECT document_identifier, text_content, score
FROM (
    SELECT *, fts_main_documents.match_bm25(
        document_identifier,
        'small cats'
    ) AS score
    FROM documents
) sq
WHERE score IS NOT NULL
ORDER BY score DESC;
document_identifier text_content score
doc2 The cat is a domestic species of small carnivorous mammal. 0.0

Warning The FTS index will not update automatically when input table changes. A workaround of this limitation can be recreating the index to refresh.


layout: docu title: inet Extension github_repository: https://github.com/duckdb/duckdb-inet

The inet extension defines the INET data type for storing IPv4 and IPv6 Internet addresses. It supports the CIDR notation for subnet masks (e.g., 198.51.100.0/22, 2001:db8:3c4d::/48).

Installing and Loading

The inet extension will be transparently [autoloaded]({% link docs/extensions/overview.md %}#autoloading-extensions) on first use from the official extension repository. If you would like to install and load it manually, run:

INSTALL inet;
LOAD inet;

Examples

SELECT '127.0.0.1'::INET AS ipv4, '2001:db8:3c4d::/48'::INET AS ipv6;
ipv4 ipv6
127.0.0.1 2001:db8:3c4d::/48
CREATE TABLE tbl (id INTEGER, ip INET);
INSERT INTO tbl VALUES
    (1, '192.168.0.0/16'),
    (2, '127.0.0.1'),
    (3, '8.8.8.8'),
    (4, 'fe80::/10'),
    (5, '2001:db8:3c4d:15::1a2f:1a2b');
SELECT * FROM tbl;
id ip
1 192.168.0.0/16
2 127.0.0.1
3 8.8.8.8
4 fe80::/10
5 2001:db8:3c4d:15::1a2f:1a2b

Operations on INET Values

INET values can be compared naturally, and IPv4 will sort before IPv6. Additionally, IP addresses can be modified by adding or subtracting integers.

CREATE TABLE tbl (cidr INET);
INSERT INTO tbl VALUES
    ('127.0.0.1'::INET + 10),
    ('fe80::10'::INET - 9),
    ('127.0.0.1'),
    ('2001:db8:3c4d:15::1a2f:1a2b');
SELECT cidr FROM tbl ORDER BY cidr ASC;
cidr
127.0.0.1
127.0.0.11
2001:db8:3c4d:15::1a2f:1a2b
fe80::7

host Function

The host component of an INET value can be extracted using the HOST() function.

CREATE TABLE tbl (cidr INET);
INSERT INTO tbl VALUES
    ('192.168.0.0/16'),
    ('127.0.0.1'),
    ('2001:db8:3c4d:15::1a2f:1a2b/96');
SELECT cidr, host(cidr) FROM tbl;
cidr host(cidr)
192.168.0.0/16 192.168.0.0
127.0.0.1 127.0.0.1
2001:db8:3c4d:15::1a2f:1a2b/96 2001:db8:3c4d:15::1a2f:1a2b

netmask Function

Computes the network mask for the address's network.

CREATE TABLE tbl (cidr INET);
INSERT INTO tbl VALUES
    ('192.168.1.5/24'),
    ('127.0.0.1'),
    ('2001:db8:3c4d:15::1a2f:1a2b/96');
SELECT cidr, netmask(cidr) FROM tbl;
cidr netmask(cidr)
192.168.1.5/24 255.255.255.0/24
127.0.0.1 255.255.255.255
2001:db8:3c4d:15::1a2f:1a2b/96 ffff:ffff:ffff:ffff:ffff:ffff::/96

network Function

Returns the network part of the address, zeroing out whatever is to the right of the netmask.

CREATE TABLE tbl (cidr INET);
INSERT INTO tbl VALUES
    ('192.168.1.5/24'),
    ('127.0.0.1'),
    ('2001:db8:3c4d:15::1a2f:1a2b/96');
SELECT cidr, network(cidr) FROM tbl;
cidr network(cidr)
192.168.1.5/24 192.168.1.0/24
127.0.0.1 255.255.255.255
2001:db8:3c4d:15::1a2f:1a2b/96 ffff:ffff:ffff:ffff:ffff:ffff::/96

broadcast Function

Computes the broadcast address for the address's network.

CREATE TABLE tbl (cidr INET);
INSERT INTO tbl VALUES
    ('192.168.1.5/24'),
    ('127.0.0.1'),
    ('2001:db8:3c4d:15::1a2f:1a2b/96');
SELECT cidr, broadcast(cidr) FROM tbl;
cidr broadcast(cidr)
192.168.1.5/24 192.168.1.0/24
127.0.0.1 127.0.0.1
2001:db8:3c4d:15::1a2f:1a2b/96 2001:db8:3c4d:15::/96

<<= Predicate

Is subnet contained by or equal to subnet?

CREATE TABLE tbl (cidr INET);
INSERT INTO tbl VALUES
    ('192.168.1.0/24'),
    ('127.0.0.1'),
    ('2001:db8:3c4d:15::1a2f:1a2b/96');
SELECT cidr, INET '192.168.1.5/32' <<= cidr FROM tbl;
cidr (CAST('192.168.1.5/32' AS INET) <<= cidr)
192.168.1.5/24 true
127.0.0.1 false
2001:db8:3c4d:15::1a2f:1a2b/96 false

>>= Predicate

Does subnet contain or equal subnet?

CREATE TABLE tbl (cidr INET);
INSERT INTO tbl VALUES
    ('192.168.1.0/24'),
    ('127.0.0.1'),
    ('2001:db8:3c4d:15::1a2f:1a2b/96');
SELECT cidr, INET '192.168.0.0/16' >>= cidr FROM tbl;
cidr (CAST('192.168.0.0/16' AS INET) >>= cidr)
192.168.1.5/24 true
127.0.0.1 false
2001:db8:3c4d:15::1a2f:1a2b/96 false

HTML Escape and Unescape Functions

SELECT html_escape('&');
┌──────────────────┐
│ html_escape('&') │
│     varchar      │
├──────────────────┤
│ &amp;            │
└──────────────────┘
SELECT html_unescape('&amp;');
┌────────────────────────┐
│ html_unescape('&amp;') │
│        varchar         │
├────────────────────────┤
│ &                      │
└────────────────────────┘

layout: docu title: SQLSmith Extension github_repository: https://github.com/duckdb/duckdb-sqlsmith

The sqlsmith extension is used for testing.

Installing and Loading

INSTALL sqlsmith;
LOAD sqlsmith;

Functions

The sqlsmith extension registers the following functions:

  • sqlsmith
  • fuzzyduck
  • reduce_sql_statement
  • fuzz_all_functions

layout: docu title: jemalloc Extension github_directory: https://github.com/duckdb/duckdb/tree/main/extension/jemalloc

The jemalloc extension replaces the system's memory allocator with jemalloc. Unlike other DuckDB extensions, the jemalloc extension is statically linked and cannot be installed or loaded during runtime.

Operating System Support

The availability of the jemalloc extension depends on the operating system.

Linux

Linux distributions of DuckDB ships with the jemalloc extension. To disable the jemalloc extension, [build DuckDB from source]({% link docs/dev/building/overview.md %}) and set the SKIP_EXTENSIONS flag as follows:

GEN=ninja SKIP_EXTENSIONS="jemalloc" make

macOS

The macOS version of DuckDB does not ship with the jemalloc extension but can be [built from source]({% link docs/dev/building/macos.md %}) to include it:

GEN=ninja BUILD_JEMALLOC=1 make

Windows

On Windows, this extension is not available.

Configuration

Environment Variables

The jemalloc allocator in DuckDB can be configured via the MALLOC_CONF environment variable.

Background Threads

By default, jemalloc's background threads are disabled. To enable them, use the following configuration option:

SET allocator_background_threads = true;

Background threads asynchronously purge outstanding allocations so that this doesn't have to be done synchronously by the foreground threads. This improves allocation performance, and should be noticeable in allocation-heavy workloads, especially on many-core CPUs.

layout: docu title: httpfs Extension for HTTP and S3 Support github_repository: https://github.com/duckdb/duckdb-httpfs redirect_from:

  • /docs/extensions/httpfs
  • /docs/extensions/httpfs/

The httpfs extension is an autoloadable extension implementing a file system that allows reading remote/writing remote files. For plain HTTP(S), only file reading is supported. For object storage using the S3 API, the httpfs extension supports reading/writing/[globbing]({% link docs/sql/functions/pattern_matching.md %}#globbing) files.

Installation and Loading

The httpfs extension will be, by default, autoloaded on first use of any functionality exposed by this extension.

To manually install and load the httpfs extension, run:

INSTALL httpfs;
LOAD httpfs;

HTTP(S)

The httpfs extension supports connecting to [HTTP(S) endpoints]({% link docs/extensions/httpfs/https.md %}).

S3 API

The httpfs extension supports connecting to [S3 API endpoints]({% link docs/extensions/httpfs/s3api.md %}).

layout: docu title: Legacy Authentication Scheme for S3 API

Prior to version 0.10.0, DuckDB did not have a [Secrets manager]({% link docs/sql/statements/create_secret.md %}). Hence, the configuration of and authentication to S3 endpoints was handled via variables. This page documents the legacy authentication scheme for the S3 API.

The recommended way to configuration and authentication of S3 endpoints is to use [secrets]({% link docs/extensions/httpfs/s3api.md %}#configuration-and-authentication).

Legacy Authentication Scheme

To be able to read or write from S3, the correct region should be set:

SET s3_region = 'us-east-1';

Optionally, the endpoint can be configured in case a non-AWS object storage server is used:

SET s3_endpoint = '⟨domain⟩.⟨tld⟩:⟨port⟩';

If the endpoint is not SSL-enabled then run:

SET s3_use_ssl = false;

Switching between path-style and vhost-style URLs is possible using:

SET s3_url_style = 'path';

However, note that this may also require updating the endpoint. For example for AWS S3 it is required to change the endpoint to s3.⟨region⟩.amazonaws.com.

After configuring the correct endpoint and region, public files can be read. To also read private files, authentication credentials can be added:

SET s3_access_key_id = '⟨AWS access key id⟩';
SET s3_secret_access_key = '⟨AWS secret access key⟩';

Alternatively, temporary S3 credentials are also supported. They require setting an additional session token:

SET s3_session_token = '⟨AWS session token⟩';

The [aws extension]({% link docs/extensions/aws.md %}) allows for loading AWS credentials.

Per-Request Configuration

Aside from the global S3 configuration described above, specific configuration values can be used on a per-request basis. This allows for use of multiple sets of credentials, regions, etc. These are used by including them on the S3 URI as query parameters. All the individual configuration values listed above can be set as query parameters. For instance:

SELECT *
FROM 's3://bucket/file.parquet?s3_access_key_id=accessKey&s3_secret_access_key=secretKey';

Multiple configurations per query are also allowed:

SELECT *
FROM 's3://bucket/file.parquet?s3_access_key_id=accessKey1&s3_secret_access_key=secretKey1' t1
INNER JOIN 's3://bucket/file.csv?s3_access_key_id=accessKey2&s3_secret_access_key=secretKey2' t2;

Configuration

Some additional configuration options exist for the S3 upload, though the default values should suffice for most use cases.

Additionally, most of the configuration options can be set via environment variables:

DuckDB setting Environment variable Note
s3_region AWS_REGION Takes priority over AWS_DEFAULT_REGION
s3_region AWS_DEFAULT_REGION
s3_access_key_id AWS_ACCESS_KEY_ID
s3_secret_access_key AWS_SECRET_ACCESS_KEY
s3_session_token AWS_SESSION_TOKEN
s3_endpoint DUCKDB_S3_ENDPOINT
s3_use_ssl DUCKDB_S3_USE_SSL

layout: docu title: Hugging Face Support

The httpfs extension introduces support for the hf:// protocol to access data sets hosted in Hugging Face repositories. See the [announcement blog post]({% post_url 2024-05-29-access-150k-plus-datasets-from-hugging-face-with-duckdb %}) for details.

Usage

Hugging Face repositories can be queried using the following URL pattern:

hf://datasets/⟨my_username⟩/⟨my_dataset⟩/⟨path_to_file⟩

For example, to read a CSV file, you can use the following query:

SELECT *
FROM 'hf://datasets/datasets-examples/doc-formats-csv-1/data.csv';

Where:

  • datasets-examples is the name of the user/organization
  • doc-formats-csv-1 is the name of the dataset repository
  • data.csv is the file path in the repository

The result of the query is:

kind sound
dog woof
cat meow
pokemon pika
human hello

To read a JSONL file, you can run:

SELECT *
FROM 'hf://datasets/datasets-examples/doc-formats-jsonl-1/data.jsonl';

Finally, for reading a Parquet file, use the following query:

SELECT *
FROM 'hf://datasets/datasets-examples/doc-formats-parquet-1/data/train-00000-of-00001.parquet';

Each of these commands reads the data from the specified file format and displays it in a structured tabular format. Choose the appropriate command based on the file format you are working with.

Creating a Local Table

To avoid accessing the remote endpoint for every query, you can save the data in a DuckDB table by running a [CREATE TABLE ... AS command]({% link docs/sql/statements/create_table.md %}#create-table--as-select-ctas). For example:

CREATE TABLE data AS
    SELECT *
    FROM 'hf://datasets/datasets-examples/doc-formats-csv-1/data.csv';

Then, simply query the data table as follows:

SELECT *
FROM data;

Multiple Files

To query all files under a specific directory, you can use a [glob pattern]({% link docs/data/multiple_files/overview.md %}#multi-file-reads-and-globs). For example:

SELECT count(*) AS count
FROM 'hf://datasets/cais/mmlu/astronomy/*.parquet';
count
173

By using glob patterns, you can efficiently handle large datasets and perform comprehensive queries across multiple files, simplifying your data inspections and processing tasks. Here, you can see how you can look for questions that contain the word “planet” in astronomy:

SELECT count(*) AS count
FROM 'hf://datasets/cais/mmlu/astronomy/*.parquet'
WHERE question LIKE '%planet%';
count
21

Versioning and Revisions

In Hugging Face repositories, dataset versions or revisions are different dataset updates. Each version is a snapshot at a specific time, allowing you to track changes and improvements. In git terms, it can be understood as a branch or specific commit.

You can query different dataset versions/revisions by using the following URL:

hf://datasets/⟨my-username⟩/⟨my-dataset⟩@⟨my_branch⟩/⟨path_to_file⟩

For example:

SELECT *
FROM 'hf://datasets/datasets-examples/doc-formats-csv-1@~parquet/**/*.parquet';
kind sound
dog woof
cat meow
pokemon pika
human hello

The previous query will read all parquet files under the ~parquet revision. This is a special branch where Hugging Face automatically generates the Parquet files of every dataset to enable efficient scanning.

Authentication

Configure your Hugging Face Token in the DuckDB Secrets Manager to access private or gated datasets. First, visit Hugging Face Settings – Tokens to obtain your access token. Second, set it in your DuckDB session using DuckDB’s [Secrets Manager]({% link docs/configuration/secrets_manager.md %}). DuckDB supports two providers for managing secrets:

CONFIG

The user must pass all configuration information into the CREATE SECRET statement. To create a secret using the CONFIG provider, use the following command:

CREATE SECRET hf_token (
    TYPE HUGGINGFACE,
    TOKEN 'your_hf_token'
);

CREDENTIAL_CHAIN

Automatically tries to fetch credentials. For the Hugging Face token, it will try to get it from ~/.cache/huggingface/token. To create a secret using the CREDENTIAL_CHAIN provider, use the following command:

CREATE SECRET hf_token (
    TYPE HUGGINGFACE,
    PROVIDER CREDENTIAL_CHAIN
);

layout: docu title: HTTP(S) Support

With the httpfs extension, it is possible to directly query files over the HTTP(S) protocol. This works for all files supported by DuckDB or its various extensions, and provides read-only access.

SELECT *
FROM 'https://domain.tld/file.extension';

Partial Reading

For CSV files, files will be downloaded entirely in most cases, due to the row-based nature of the format. For Parquet files, DuckDB supports [partial reading]({% link docs/data/parquet/overview.md %}#partial-reading), i.e., it can use a combination of the Parquet metadata and HTTP range requests to only download the parts of the file that are actually required by the query. For example, the following query will only read the Parquet metadata and the data for the column_a column:

SELECT column_a
FROM 'https://domain.tld/file.parquet';

In some cases, no actual data needs to be read at all as they only require reading the metadata:

SELECT count(*)
FROM 'https://domain.tld/file.parquet';

Scanning Multiple Files

Scanning multiple files over HTTP(S) is also supported:

SELECT *
FROM read_parquet([
    'https://domain.tld/file1.parquet',
    'https://domain.tld/file2.parquet'
]);

Authenticating

To authenticate for an HTTP(S) endpoint, create an HTTP secret using the [Secrets Manager]({% link docs/configuration/secrets_manager.md %}):

CREATE SECRET http_auth (
    TYPE HTTP,
    BEARER_TOKEN '⟨token⟩'
);

Or:

CREATE SECRET http_auth (
    TYPE HTTP,
    EXTRA_HTTP_HEADERS MAP {
        'Authorization': 'Bearer ⟨token⟩'
    }
);

HTTP Proxy

DuckDB supports HTTP proxies.

You can add an HTTP proxy using the [Secrets Manager]({% link docs/configuration/secrets_manager.md %}):

CREATE SECRET http_proxy (
    TYPE HTTP,
    HTTP_PROXY '⟨http_proxy_url⟩',
    HTTP_PROXY_USERNAME '⟨username⟩',
    HTTP_PROXY_PASSWORD '⟨password⟩'
);

Alternatively, you can add it via [configuration options]({% link docs/configuration/pragmas.md %}):

SET http_proxy = '⟨http_proxy_url⟩';
SET http_proxy_username = '⟨username⟩';
SET http_proxy_password = '⟨password⟩';

Using a Custom Certificate File

To use the httpfs extension with a custom certificate file, set the following [configuration options]({% link docs/configuration/pragmas.md %}) prior to loading the extension:

LOAD httpfs;
SET ca_cert_file = '⟨certificate_file⟩';
SET enable_server_cert_verification = true;

layout: docu title: Arrow Extension github_repository: https://github.com/duckdb/arrow

The arrow extension implements features for using Apache Arrow, a cross-language development platform for in-memory analytics. See the [announcement blog post]({% post_url 2021-12-03-duck-arrow %}) for more details.

Installing and Loading

The arrow extension will be transparently autoloaded on first use from the official extension repository. If you would like to install and load it manually, run:

INSTALL arrow;
LOAD arrow;

Functions

Function Type Description
to_arrow_ipc Table in-out function Serializes a table into a stream of blobs containing Arrow IPC buffers
scan_arrow_ipc Table function Scan a list of pointers pointing to Arrow IPC buffers

layout: docu title: AWS Extension github_repository: https://github.com/duckdb/duckdb_aws

The aws extension adds functionality (e.g., authentication) on top of the httpfs extension's [S3 capabilities]({% link docs/extensions/httpfs/overview.md %}#s3-api), using the AWS SDK.

Warning In most cases, you will not need to explicitly interact with the aws extension. It will automatically be invoked whenever you use DuckDB's [S3 Secret functionality]({% link docs/sql/statements/create_secret.md %}). See the [httpfs extension's S3 capabilities]({% link docs/extensions/httpfs/overview.md %}#s3) for instructions.

Installing and Loading

The aws extension will be transparently [autoloaded]({% link docs/extensions/overview.md %}#autoloading-extensions) on first use from the official extension repository. If you would like to install and load it manually, run:

INSTALL aws;
LOAD aws;

Related Extensions

aws depends on httpfs extension capabilities, and both will be autoloaded on the first call to load_aws_credentials. If autoinstall or autoload are disabled, you can always explicitly install and load httpfs as follows:

INSTALL httpfs;
LOAD httpfs;

Legacy Features

Deprecated The load_aws_credentials function is deprecated.

Prior to version 0.10.0, DuckDB did not have a [Secrets manager]({% link docs/sql/statements/create_secret.md %}), to load the credentials automatically, the AWS extension provided a special function to load the AWS credentials in the [legacy authentication method]({% link docs/extensions/httpfs/s3api_legacy_authentication.md %}).

Function Type Description
load_aws_credentials PRAGMA function Loads the AWS credentials through the AWS Default Credentials Provider Chain

Load AWS Credentials (Legacy)

To load the AWS credentials, run:

CALL load_aws_credentials();
loaded_access_key_id loaded_secret_access_key loaded_session_token loaded_region
AKIAIOSFODNN7EXAMPLE <redacted> NULL us-east-2

The function takes a string parameter to specify a specific profile:

CALL load_aws_credentials('minio-testing-2');
loaded_access_key_id loaded_secret_access_key loaded_session_token loaded_region
minio_duckdb_user_2 <redacted> NULL NULL

There are several parameters to tweak the behavior of the call:

CALL load_aws_credentials('minio-testing-2', set_region = false, redact_secret = false);
loaded_access_key_id loaded_secret_access_key loaded_session_token loaded_region
minio_duckdb_user_2 minio_duckdb_user_password_2 NULL NULL

layout: docu title: S3 API Support

The httpfs extension supports reading/writing/globbing files on object storage servers using the S3 API. S3 offers a standard API to read and write to remote files (while regular http servers, predating S3, do not offer a common write API). DuckDB conforms to the S3 API, that is now common among industry storage providers.

Platforms

The httpfs filesystem is tested with AWS S3, Minio, Google Cloud, and lakeFS. Other services that implement the S3 API (such as Cloudflare R2) should also work, but not all features may be supported.

The following table shows which parts of the S3 API are required for each httpfs feature.

Feature Required S3 API features
Public file reads HTTP Range requests
Private file reads Secret key or session token authentication
File glob ListObjectV2
File writes Multipart upload

Configuration and Authentication

The preferred way to configure and authenticate to S3 endpoints is to use [secrets]({% link docs/sql/statements/create_secret.md %}). Multiple secret providers are available.

Deprecated Prior to version 0.10.0, DuckDB did not have a [Secrets manager]({% link docs/sql/statements/create_secret.md %}). Hence, the configuration of and authentication to S3 endpoints was handled via variables. See the [legacy authentication scheme for the S3 API]({% link docs/extensions/httpfs/s3api_legacy_authentication.md %}).

CONFIG Provider

The default provider, CONFIG (i.e., user-configured), allows access to the S3 bucket by manually providing a key. For example:

CREATE SECRET secret1 (
    TYPE S3,
    KEY_ID 'AKIAIOSFODNN7EXAMPLE',
    SECRET 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY',
    REGION 'us-east-1'
);

Tip If you get an IO Error (Connection error for HTTP HEAD), configure the endpoint explicitly via ENDPOINT 's3.⟨your-region⟩.amazonaws.com'.

Now, to query using the above secret, simply query any s3:// prefixed file:

SELECT *
FROM 's3://my-bucket/file.parquet';

CREDENTIAL_CHAIN Provider

The CREDENTIAL_CHAIN provider allows automatically fetching credentials using mechanisms provided by the AWS SDK. For example, to use the AWS SDK default provider:

CREATE SECRET secret2 (
    TYPE S3,
    PROVIDER CREDENTIAL_CHAIN
);

Again, to query a file using the above secret, simply query any s3:// prefixed file.

DuckDB also allows specifying a specific chain using the CHAIN keyword. This takes a semicolon-separated list (a;b;c) of providers that will be tried in order. For example:

CREATE SECRET secret3 (
    TYPE S3,
    PROVIDER CREDENTIAL_CHAIN,
    CHAIN 'env;config'
);

The possible values for CHAIN are the following:

The CREDENTIAL_CHAIN provider also allows overriding the automatically fetched config. For example, to automatically load credentials, and then override the region, run:

CREATE SECRET secret4 (
    TYPE S3,
    PROVIDER CREDENTIAL_CHAIN,
    CHAIN 'config',
    REGION 'eu-west-1'
);

Overview of S3 Secret Parameters

Below is a complete list of the supported parameters that can be used for both the CONFIG and CREDENTIAL_CHAIN providers:

Name Description Secret Type Default
KEY_ID The ID of the key to use S3, GCS, R2 STRING -
SECRET The secret of the key to use S3, GCS, R2 STRING -
REGION The region for which to authenticate (should match the region of the bucket to query) S3, GCS, R2 STRING us-east-1
SESSION_TOKEN Optionally, a session token can be passed to use temporary credentials S3, GCS, R2 STRING -
ENDPOINT Specify a custom S3 endpoint S3, GCS, R2 STRING s3.amazonaws.com for S3,
URL_STYLE Either vhost or path S3, GCS, R2 STRING vhost for S3, path for R2 and GCS
USE_SSL Whether to use HTTPS or HTTP S3, GCS, R2 BOOLEAN true
URL_COMPATIBILITY_MODE Can help when URLs contain problematic characters S3, GCS, R2 BOOLEAN true
ACCOUNT_ID The R2 account ID to use for generating the endpoint URL R2 STRING -

Platform-Specific Secret Types

R2 Secrets

While Cloudflare R2 uses the regular S3 API, DuckDB has a special Secret type, R2, to make configuring it a bit simpler:

CREATE SECRET secret5 (
    TYPE R2,
    KEY_ID 'AKIAIOSFODNN7EXAMPLE',
    SECRET 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY',
    ACCOUNT_ID 'my_account_id'
);

Note the addition of the ACCOUNT_ID which is used to generate to correct endpoint URL for you. Also note that for R2 Secrets can also use both the CONFIG and CREDENTIAL_CHAIN providers. Finally, R2 secrets are only available when using URLs starting with r2://, for example:

SELECT *
FROM read_parquet('r2://some/file/that/uses/r2/secret/file.parquet');

GCS Secrets

While Google Cloud Storage is accessed by DuckDB using the S3 API, DuckDB has a special Secret type, GCS, to make configuring it a bit simpler:

CREATE SECRET secret6 (
    TYPE GCS,
    KEY_ID 'my_key',
    SECRET 'my_secret'
);

Note that the above secret, will automatically have the correct Google Cloud Storage endpoint configured. Also note that for GCS Secrets can also use both the CONFIG and CREDENTIAL_CHAIN providers. Finally, GCS secrets are only available when using URLs starting with gcs:// or gs://, for example:

SELECT *
FROM read_parquet('gcs://some/file/that/uses/gcs/secret/file.parquet');

Reading

Reading files from S3 is now as simple as:

SELECT *
FROM 's3://bucket/file.extension';

Partial Reading

The httpfs extension supports [partial reading]({% link docs/extensions/httpfs/https.md %}#partial-reading) from S3 buckets.

Reading Multiple Files

Multiple files are also possible, for example:

SELECT *
FROM read_parquet([
    's3://bucket/file1.parquet',
    's3://bucket/file2.parquet'
]);

Globbing

File [globbing]({% link docs/sql/functions/pattern_matching.md %}#globbing) is implemented using the ListObjectV2 API call and allows to use filesystem-like glob patterns to match multiple files, for example:

SELECT *
FROM read_parquet('s3://bucket/*.parquet');

This query matches all files in the root of the bucket with the [Parquet extension]({% link docs/data/parquet/overview.md %}).

Several features for matching are supported, such as * to match any number of any character, ? for any single character or [0-9] for a single character in a range of characters:

SELECT count(*) FROM read_parquet('s3://bucket/folder*/100?/t[0-9].parquet');

A useful feature when using globs is the filename option, which adds a column named filename that encodes the file that a particular row originated from:

SELECT *
FROM read_parquet('s3://bucket/*.parquet', filename = true);

could for example result in:

column_a column_b filename
1 examplevalue1 s3://bucket/file1.parquet
2 examplevalue1 s3://bucket/file2.parquet

Hive Partitioning

DuckDB also offers support for the [Hive partitioning scheme]({% link docs/data/partitioning/hive_partitioning.md %}), which is available when using HTTP(S) and S3 endpoints.

Writing

Writing to S3 uses the multipart upload API. This allows DuckDB to robustly upload files at high speed. Writing to S3 works for both CSV and Parquet:

COPY table_name TO 's3://bucket/file.extension';

Partitioned copy to S3 also works:

COPY table TO 's3://my-bucket/partitioned' (
    FORMAT PARQUET,
    PARTITION_BY (part_col_a, part_col_b)
);

An automatic check is performed for existing files/directories, which is currently quite conservative (and on S3 will add a bit of latency). To disable this check and force writing, an OVERWRITE_OR_IGNORE flag is added:

COPY table TO 's3://my-bucket/partitioned' (
    FORMAT PARQUET,
    PARTITION_BY (part_col_a, part_col_b),
    OVERWRITE_OR_IGNORE true
);

The naming scheme of the written files looks like this:

s3://my-bucket/partitioned/part_col_a=⟨val⟩/part_col_b=⟨val⟩/data_⟨thread_number⟩.parquet

Configuration

Some additional configuration options exist for the S3 upload, though the default values should suffice for most use cases.

Name Description
s3_uploader_max_parts_per_file used for part size calculation, see AWS docs
s3_uploader_max_filesize used for part size calculation, see AWS docs
s3_uploader_thread_limit maximum number of uploader threads

layout: docu title: Spatial Extension github_repository: https://github.com/duckdb/duckdb_spatial redirect_from:

  • /docs/extensions/spatial
  • /docs/extensions/spatial/

The spatial extension provides support for geospatial data processing in DuckDB. For an overview of the extension, see our [blog post]({% post_url 2023-04-28-spatial %}).

Installing and Loading

To install and load the spatial extension, run:

INSTALL spatial;
LOAD spatial;

The GEOMETRY Type

The core of the spatial extension is the GEOMETRY type. If you're unfamiliar with geospatial data and GIS tooling, this type probably works very different from what you'd expect.

On the surface, the GEOMETRY type is a binary representation of “geometry” data made up out of sets of vertices (pairs of X and Y double precision floats). But what makes it somewhat special is that its actually used to store one of several different geometry subtypes. These are POINT, LINESTRING, POLYGON, as well as their “collection” equivalents, MULTIPOINT, MULTILINESTRING and MULTIPOLYGON. Lastly there is GEOMETRYCOLLECTION, which can contain any of the other subtypes, as well as other GEOMETRYCOLLECTIONs recursively.

This may seem strange at first, since DuckDB already have types like LIST, STRUCT and UNION which could be used in a similar way, but the design and behavior of the GEOMETRY type is actually based on the Simple Features geometry model, which is a standard used by many other databases and GIS software.

The spatial extension also includes a couple of experimental non-standard explicit geometry types, such as POINT_2D, LINESTRING_2D, POLYGON_2D and BOX_2D that are based on DuckDBs native nested types, such as STRUCT and LIST. Since these have a fixed and predictable internal memory layout, it is theoretically possible to optimize a lot of geospatial algorithms to be much faster when operating on these types than on the GEOMETRY type. However, only a couple of functions in the spatial extension have been explicitly specialized for these types so far. All of these new types are implicitly castable to GEOMETRY, but with a small conversion cost, so the GEOMETRY type is still the recommended type to use for now if you are planning to work with a lot of different spatial functions.

GEOMETRY is not currently capable of storing additional geometry types such as curved geometries or triangle networks. Additionally, the GEOMETRY type does not store SRID information on a per value basis. These limitations may be addressed in the future.

layout: docu title: SQLite Extension github_repository: https://github.com/duckdb/duckdb-sqlite redirect_from:

  • /docs/extensions/sqlite_scanner
  • /docs/extensions/sqlite_scanner/

The SQLite extension allows DuckDB to directly read and write data from a SQLite database file. The data can be queried directly from the underlying SQLite tables. Data can be loaded from SQLite tables into DuckDB tables, or vice versa.

Installing and Loading

The sqlite extension will be transparently [autoloaded]({% link docs/extensions/overview.md %}#autoloading-extensions) on first use from the official extension repository. If you would like to install and load it manually, run:

INSTALL sqlite;
LOAD sqlite;

Usage

To make a SQLite file accessible to DuckDB, use the ATTACH statement with the SQLITE or SQLITE_SCANNER type. Attached SQLite databases support both read and write operations.

For example, to attach to the sakila.db file, run:

ATTACH 'sakila.db' (TYPE SQLITE);
USE sakila;

The tables in the file can be read as if they were normal DuckDB tables, but the underlying data is read directly from the SQLite tables in the file at query time.

SHOW TABLES;
name
actor
address
category
city
country
customer
customer_list
film
film_actor
film_category
film_list
film_text
inventory
language
payment
rental
sales_by_film_category
sales_by_store
staff
staff_list
store

You can query the tables using SQL, e.g., using the example queries from sakila-examples.sql:

SELECT
    cat.name AS category_name,
    sum(ifnull(pay.amount, 0)) AS revenue
FROM category cat
LEFT JOIN film_category flm_cat
       ON cat.category_id = flm_cat.category_id
LEFT JOIN film fil
       ON flm_cat.film_id = fil.film_id
LEFT JOIN inventory inv
       ON fil.film_id = inv.film_id
LEFT JOIN rental ren
       ON inv.inventory_id = ren.inventory_id
LEFT JOIN payment pay
       ON ren.rental_id = pay.rental_id
GROUP BY cat.name
ORDER BY revenue DESC
LIMIT 5;

Data Types

SQLite is a weakly typed database system. As such, when storing data in a SQLite table, types are not enforced. The following is valid SQL in SQLite:

CREATE TABLE numbers (i INTEGER);
INSERT INTO numbers VALUES ('hello');

DuckDB is a strongly typed database system, as such, it requires all columns to have defined types and the system rigorously checks data for correctness.

When querying SQLite, DuckDB must deduce a specific column type mapping. DuckDB follows SQLite's type affinity rules with a few extensions.

  1. If the declared type contains the string INT then it is translated into the type BIGINT
  2. If the declared type of the column contains any of the strings CHAR, CLOB, or TEXT then it is translated into VARCHAR.
  3. If the declared type for a column contains the string BLOB or if no type is specified then it is translated into BLOB.
  4. If the declared type for a column contains any of the strings REAL, FLOA, DOUB, DEC or NUM then it is translated into DOUBLE.
  5. If the declared type is DATE, then it is translated into DATE.
  6. If the declared type contains the string TIME, then it is translated into TIMESTAMP.
  7. If none of the above apply, then it is translated into VARCHAR.

As DuckDB enforces the corresponding columns to contain only correctly typed values, we cannot load the string “hello” into a column of type BIGINT. As such, an error is thrown when reading from the “numbers” table above:

Mismatch Type Error: Invalid type in column "i": column was declared as integer, found "hello" of type "text" instead.

This error can be avoided by setting the sqlite_all_varchar option:

SET GLOBAL sqlite_all_varchar = true;

When set, this option overrides the type conversion rules described above, and instead always converts the SQLite columns into a VARCHAR column. Note that this setting must be set before sqlite_attach is called.

Opening SQLite Databases Directly

SQLite databases can also be opened directly and can be used transparently instead of a DuckDB database file. In any client, when connecting, a path to a SQLite database file can be provided and the SQLite database will be opened instead.

For example, with the shell, a SQLite database can be opened as follows:

duckdb sakila.db
SELECT first_name
FROM actor
LIMIT 3;
first_name
PENELOPE
NICK
ED

Writing Data to SQLite

In addition to reading data from SQLite, the extension also allows you to create new SQLite database files, create tables, ingest data into SQLite and make other modifications to SQLite database files using standard SQL queries.

This allows you to use DuckDB to, for example, export data that is stored in a SQLite database to Parquet, or read data from a Parquet file into SQLite.

Below is a brief example of how to create a new SQLite database and load data into it.

ATTACH 'new_sqlite_database.db' AS sqlite_db (TYPE SQLITE);
CREATE TABLE sqlite_db.tbl (id INTEGER, name VARCHAR);
INSERT INTO sqlite_db.tbl VALUES (42, 'DuckDB');

The resulting SQLite database can then be read into from SQLite.

sqlite3 new_sqlite_database.db
SQLite version 3.39.5 2022-10-14 20:58:05
sqlite> SELECT * FROM tbl;
id  name  
--  ------
42  DuckDB

Many operations on SQLite tables are supported. All these operations directly modify the SQLite database, and the result of subsequent operations can then be read using SQLite.

Concurrency

DuckDB can read or modify a SQLite database while DuckDB or SQLite reads or modifies the same database from a different thread or a separate process. More than one thread or process can read the SQLite database at the same time, but only a single thread or process can write to the database at one time. Database locking is handled by the SQLite library, not DuckDB. Within the same process, SQLite uses mutexes. When accessed from different processes, SQLite uses file system locks. The locking mechanisms also depend on SQLite configuration, like WAL mode. Refer to the SQLite documentation on locking for more information.

Warning Linking multiple copies of the SQLite library into the same application can lead to application errors. See sqlite_scanner Issue #82 for more information.

Settings

The extension exposes the following configuration parameters.

Name Description Default
sqlite_debug_show_queries DEBUG SETTING: print all queries sent to SQLite to stdout false

Supported Operations

Below is a list of supported operations.

CREATE TABLE

CREATE TABLE sqlite_db.tbl (id INTEGER, name VARCHAR);

INSERT INTO

INSERT INTO sqlite_db.tbl VALUES (42, 'DuckDB');

SELECT

SELECT * FROM sqlite_db.tbl;
id name
42 DuckDB

COPY

COPY sqlite_db.tbl TO 'data.parquet';
COPY sqlite_db.tbl FROM 'data.parquet';

UPDATE

UPDATE sqlite_db.tbl SET name = 'Woohoo' WHERE id = 42;

DELETE

DELETE FROM sqlite_db.tbl WHERE id = 42;

ALTER TABLE

ALTER TABLE sqlite_db.tbl ADD COLUMN k INTEGER;

DROP TABLE

DROP TABLE sqlite_db.tbl;

CREATE VIEW

CREATE VIEW sqlite_db.v1 AS SELECT 42;

Transactions

CREATE TABLE sqlite_db.tmp (i INTEGER);
BEGIN;
INSERT INTO sqlite_db.tmp VALUES (42);
SELECT * FROM sqlite_db.tmp;
i
42
ROLLBACK;
SELECT * FROM sqlite_db.tmp;
i

Deprecated The old sqlite_attach function is deprecated. It is recommended to switch over to the new [ATTACH syntax]({% link docs/sql/statements/attach.md %}).


layout: docu title: GDAL Integration

The spatial extension integrates the GDAL translator library to read and write spatial data from a variety of geospatial vector file formats. See the documentation for the [st_read table function]({% link docs/extensions/spatial/functions.md %}#st_read) for how to make use of this in practice.

In order to spare users from having to setup and install additional dependencies on their system, the spatial extension bundles its own copy of the GDAL library. This also means that spatial's version of GDAL may not be the latest version available or provide support for all of the file formats that a system-wide GDAL installation otherwise would. Refer to the section on the [st_drivers table function]({% link docs/extensions/spatial/functions.md %}#st_drivers) to inspect which GDAL drivers are currently available.

GDAL Based COPY Function

The spatial extension does not only enable importing geospatial file formats (through the ST_Read function), it also enables exporting DuckDB tables to different geospatial vector formats through a GDAL based COPY function.

For example, to export a table to a GeoJSON file, with generated bounding boxes, you can use the following query:

COPY ⟨table⟩ TO 'some/file/path/filename.geojson'
WITH (FORMAT GDAL, DRIVER 'GeoJSON', LAYER_CREATION_OPTIONS 'WRITE_BBOX=YES');

Available options:

  • FORMAT: is the only required option and must be set to GDAL to use the GDAL based copy function.
  • DRIVER: is the GDAL driver to use for the export. Use ST_Drivers() to list the names of all available drivers.
  • LAYER_CREATION_OPTIONS: list of options to pass to the GDAL driver. See the GDAL docs for the driver you are using for a list of available options.
  • SRS: Set a spatial reference system as metadata to use for the export. This can be a WKT string, an EPSG code or a proj-string, basically anything you would normally be able to pass to GDAL. Note that this will not perform any reprojection of the input geometry, it just sets the metadata if the target driver supports it.

Limitations

Note that only vector based drivers are supported by the GDAL integration. Reading and writing raster formats are not supported.

layout: docu title: Excel Extension github_repository: https://github.com/duckdb/duckdb-excel

The excel extension provides functions to format numbers per Excel's formatting rules by wrapping the i18npool library, but as of DuckDB 1.2 also provides functionality to read and write Excel (.xlsx) files. However, .xls files are not supported.

Previously, reading and writing Excel files was handled through the [spatial extension]({% link docs/extensions/spatial/overview.md %}), which coincidentally included support for XLSX files through one of its dependencies, but this capability may be removed from the spatial extension in the future. Additionally, the excel extension is more efficient and provides more control over the import/export process. See the [Excel Import]({% link docs/guides/file_formats/excel_import.md %}) and [Excel Export]({% link docs/guides/file_formats/excel_export.md %}) pages for instructions.

Installing and Loading

The excel extension will be transparently [autoloaded]({% link docs/extensions/overview.md %}#autoloading-extensions) on first use from the official extension repository. If you would like to install and load it manually, run:

INSTALL excel;
LOAD excel;

Excel Scalar Functions

Function Description
excel_text(number, format_string) Format the given number per the rules given in the format_string
text(number, format_string) Alias for excel_text

Examples

SELECT excel_text(1_234_567.897, 'h:mm AM/PM') AS timestamp;
timestamp
9:31 PM
SELECT excel_text(1_234_567.897, 'h AM/PM') AS timestamp;
timestamp
9 PM

Reading XLSX Files

Reading a .xlsx file is as simple as just SELECTing from it immediately, e.g.:

SELECT *
FROM 'test.xlsx';
a b
1.0 2.0
3.0 4.0

However, if you want to set additional options to control the import process, you can use the read_xlsx function instead. The following named parameters are supported.

Option Type Default Description
header BOOLEAN automatically inferred Whether to treat the first row as containing the names of the resulting columns.
sheet VARCHAR automatically inferred The name of the sheet in the xlsx file to read. Default is the first sheet.
all_varchar BOOLEAN false Whether to read all cells as containing VARCHARs.
ignore_errors BOOLEAN false Whether to ignore errors and silently replace cells that cant be cast to the corresponding inferred column type with NULL's.
range VARCHAR automatically inferred The range of cells to read, in spreadsheet notation. For example, A1:B2 reads the cells from A1 to B2. If not specified the resulting range will be inferred as rectangular region of cells between the first row of consecutive non-empty cells and the first empty row spanning the same columns.
stop_at_empty BOOLEAN automatically inferred Whether to stop reading the file when an empty row is encountered. If an explicit range option is provided, this is false by default, otherwise true.
empty_as_varchar BOOLEAN false Whether to treat empty cells as VARCHAR instead of DOUBLE when trying to automatically infer column types.
SELECT *
FROM read_xlsx('test.xlsx', header = true);
a b
1.0 2.0
3.0 4.0

Alternatively, the COPY statement with the XLSX format option can be used to import an Excel file into an existing table, in which case the types of the columns in the target table will be used to coerce the types of the cells in the Excel file.

CREATE TABLE test (a DOUBLE, b DOUBLE);
COPY test FROM 'test.xlsx' WITH (FORMAT 'xlsx', HEADER);
SELECT * FROM test;

Type and Range Inference

Because Excel itself only really stores numbers or strings in cells, and dont enforce that all cells in a column is of the same type, the excel extension has to do some guesswork to "infer" and decide the types of the columns when importing an Excel sheet. While almost all columns are inferred as either DOUBLE or VARCHAR, there are some caveats:

  • TIMESTAMP, TIME, DATE and BOOLEAN types are inferred when possible based on the format applied to the cell.
  • Text cells containing TRUE and FALSE are inferred as BOOLEAN.
  • Empty cells are considered to be DOUBLE by default, unless the empty_as_varchar option is set to true, in which case they are typed as VARCHAR.

If the all_varchar option is set to true, none of the above applies and all cells are read as VARCHAR.

When no types are specified explicitly, (e.g., when using the read_xlsx function instead of COPY TO ... FROM '⟨file⟩.xlsx') the types of the resulting columns are inferred based on the first "data" row in the sheet, that is:

  • If no explicit range is given
    • The first row after the header if a header is found or forced by the header option
    • The first non-empty row in the sheet if no header is found or forced
  • If an explicit range is given
    • The second row of the range if a header is found in the first row or forced by the header option
    • The first row of the range if no header is found or forced

This can sometimes lead to issues if the first "data row" is not representative of the rest of the sheet (e.g., it contains empty cells) in which case the ignore_errors or empty_as_varchar options can be used to work around this.

However, when the COPY TO ... FROM '⟨file⟩.xlsx' syntax is used, no type inference is done and the types of the resulting columns are determined by the types of the columns in the table being copied to. All cells will simply be converted by casting from DOUBLE or VARCHAR to the target column type.

Writing XLSX Files

Writing .xlsx files is supported using the COPY statement with XLSX given as the format. The following additional parameters are supported.

Option Type Default Description
header BOOLEAN false Whether to write the column names as the first row in the sheet
sheet VARCHAR Sheet1 The name of the sheet in the xlsx file to write.
sheet_row_limit INTEGER 1048576 The maximum number of rows in a sheet. An error is thrown if this limit is exceeded.

Warning Many tools only support a maximum of 1,048,576 rows in a sheet, so increasing the sheet_row_limit may render the resulting file unreadable by other software.

These are passed as options to the COPY statement after the FORMAT, e.g.:

CREATE TABLE test AS
    SELECT *
    FROM (VALUES (1, 2), (3, 4)) AS t(a, b);
COPY test TO 'test.xlsx' WITH (FORMAT 'xlsx', HEADER true);

Type conversions

Because XLSX files only really support storing numbers or strings – the equivalent of VARCHAR and DOUBLE, the following type conversions are applied when writing XLSX files.

  • Numeric types are cast to DOUBLE when writing to an XLSX file.
  • Temporal types (TIMESTAMP, DATE, TIME, etc.) are converted to excel "serial" numbers, that is the number of days since 1900-01-01 for dates and the fraction of a day for times. These are then styled with a "number format" so that they appear as dates or times when opened in Excel.
  • TIMESTAMP_TZ and TIME_TZ are cast to UTC TIMESTAMP and TIME respectively, with the timezone information being lost.
  • BOOLEANs are converted to 1 and 0, with a "number format" applied to make them appear as TRUE and FALSE in Excel.
  • All other types are cast to VARCHAR and then written as text cells.

layout: docu title: R-Tree Indexes

As of DuckDB v1.1.0 the [spatial extension]({% link docs/extensions/spatial/overview.md %}) provides basic support for spatial indexing through the R-tree extension index type.

Why Should I Use an R-Tree Index?

When working with geospatial datasets, it is very common that you want to filter rows based on their spatial relationship with a specific region of interest. Unfortunately, even though DuckDB's vectorized execution engine is pretty fast, this sort of operation does not scale very well to large datasets as it always requires a full table scan to check every row in the table. However, by indexing a table with an R-tree, it is possible to accelerate these types of queries significantly.

How Do R-Tree Indexes Work?

An R-tree is a balanced tree data structure that stores the approximate minimum bounding rectangle of each geometry (and the internal ID of the corresponding row) in the leaf nodes, and the bounding rectangle enclosing all of the child nodes in each internal node.

The minimum bounding rectangle (MBR) of a geometry is the smallest rectangle that completely encloses the geometry. Usually when we talk about the bounding rectangle of a geometry (or the bounding "box" in the context of 2D geometry), we mean the minimum bounding rectangle. Additionally, we tend to assume that bounding boxes/rectangles are axis-aligned, i.e., the rectangle is not rotated – the sides are always parallel to the coordinate axes. The MBR of a point is the point itself.

By traversing the R-tree from top to bottom, it is possible to very quickly search a R-tree-indexed table for only those rows where the indexed geometry column intersect a specific region of interest, as you can skip searching entire sub-trees if the bounding rectangles of their parent nodes don't intersect the query region at all. Once the leaf nodes are reached, only the specific rows whose geometries intersect the query region have to be fetched from disk, and the often much more expensive exact spatial predicate check (and any other filters) only have to be executed for these rows.

What Are the Limitations of R-Tree Indexes in DuckDB?

Before you get started using the R-tree index, there are some limitations to be aware of:

  • The R-tree index is only supported for the GEOMETRY data type.
  • The R-tree index will only be used to perform "index scans" when the table is filtered (using a WHERE clause) with one of the following spatial predicate functions (as they all imply intersection): ST_Equals, ST_Intersects, ST_Touches, ST_Crosses, ST_Within, ST_Contains, ST_Overlaps, ST_Covers, ST_CoveredBy, ST_ContainsProperly.
  • One of the arguments to the spatial predicate function must be a “constant” (i.e., a expression whose result is known at query planning time). This is because the query planner needs to know the bounding box of the query region before the query itself is executed in order to use the R-tree index scan.

In the future we want to enable R-tree indexes to be used to accelerate additional predicate functions and more complex queries such a spatial joins.

How To Use R-Tree Indexes in DuckDB

To create an R-tree index, simply use the CREATE INDEX statement with the USING RTREE clause, passing the geometry column to index within the parentheses. For example:

-- Create a table with a geometry column
CREATE TABLE my_table (geom GEOMETRY);

-- Create an R-tree index on the geometry column
CREATE INDEX my_idx ON my_table USING RTREE (geom);

You can also pass in additional options when creating an R-tree index using the WITH clause to control the behavior of the R-tree index. For example, to specify the maximum number of entries per node in the R-tree, you can use the max_node_capacity option:

CREATE INDEX my_idx ON my_table USING RTREE (geom) WITH (max_node_capacity = 16);

The impact tweaking these options will have on performance is highly dependent on the system setup DuckDB is running on, the spatial distribution of the dataset, and the query patterns of your specific workload. The defaults should be good enough, but you if you want to experiment with different parameters, see the full list of options here.

Example

Here is an example that shows how to create an R-tree index on a geometry column and where we can see that the RTREE_INDEX_SCAN operator is used when the table is filtered with a spatial predicate:

INSTALL spatial;
LOAD spatial;

-- Create a table with 10_000_000 random points
CREATE TABLE t1 AS SELECT point::GEOMETRY AS geom
FROM st_generatepoints({min_x: 0, min_y: 0, max_x: 100, max_y: 100}::BOX_2D, 10_000, 1337);

-- Create an index on the table.
CREATE INDEX my_idx ON t1 USING RTREE (geom);

-- Perform a query with a "spatial predicate" on the indexed geometry column
-- Note how the second argument in this case, the ST_MakeEnvelope call is a "constant"
SELECT count(*) FROM t1 WHERE ST_Within(geom, ST_MakeEnvelope(45, 45, 65, 65));
390

We can check for ourselves that an R-tree index scan is used by using the EXPLAIN statement:

EXPLAIN SELECT count(*) FROM t1 WHERE ST_Within(geom, ST_MakeEnvelope(45, 45, 65, 65));
┌───────────────────────────┐
│    UNGROUPED_AGGREGATE    │
│    ────────────────────   │
│        Aggregates:        │
│        count_star()       │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│           FILTER          │
│    ────────────────────   │
│ ST_Within(geom, '...')    │ 
│                           │
│         ~2000 Rows        │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│     RTREE_INDEX_SCAN      │
│    ────────────────────   │
│   t1 (RTREE INDEX SCAN :  │
│           my_idx)         │
│                           │
│     Projections: geom     │
│                           │
│        ~10000 Rows        │
└───────────────────────────┘

Performance Considerations

Bulk Loading & Maintenance

Creating R-trees on top of an already populated table is much faster than first creating the index and then inserting the data. This is because the R-tree will have to periodically rebalance itself and perform a somewhat costly splitting operation when a node reaches max capacity after an insert, potentially causing additional splits to cascade up the tree. However, when the R-tree index is created on an already populated table, a special bottom up "bulk loading algorithm" (Sort-Tile-Recursive) is used, which divides all entries into an already balanced tree as the total number of required nodes can be computed from the beginning.

Additionally, using the bulk loading algorithm tends to create a R-tree with a better structure (less overlap between bounding boxes), which usually leads to better query performance. If you find that the performance of querying the R-tree starts to deteriorate after a large number of updates or deletions, dropping and re-creating the index might produce a higher quality R-tree.

Memory Usage

Like DuckDB's built in ART-index, all the associated buffers containing the R-tree will be lazily loaded from disk (when running DuckDB in disk-backed mode), but they are currently never unloaded unless the index is dropped. This means that if you end up scanning the entire index, the entire index will be loaded into memory and stay there for the duration of the database connection. However, all memory used by the R-tree index (even during bulk-loading) is tracked by DuckDB, and will count towards the memory limit set by the memory_limit configuration parameter.

Tuning

Depending on you specific workload, you might want to experiment with the max_node_capacity and min_node_capacity options to change the structure of the R-tree and how it responds to insertions and deletions, see the full list of options here. In general, a tree with a higher total number of nodes (i.e., a lower max_node_capacity) may result in a more granular structure that enables more aggressive pruning of sub-trees during query execution, but it will also require more memory to store the tree itself and be more punishing when querying larger regions as more internal nodes will have to be traversed.

Options

The following options can be passed to the WITH clause when creating an R-tree index: (e.g., CREATE INDEX my_idx ON my_table USING RTREE (geom) WITH (⟨option⟩ = ⟨value⟩);)

Option Description Default
max_node_capacity The maximum number of entries per node in the R-tree 128
min_node_capacity The minimum number of entries per node in the R-tree 0.4 * max_node_capacity

*Should a node fall under the minimum number of entries after a deletion, the node will be dissolved and all the entries reinserted from the top of the tree. This is a common operation in R-tree implementations to prevent the tree from becoming too unbalanced.

R-Tree Table Functions

The rtree_index_dump(VARCHAR) table function can be used to return all the nodes within an R-tree index which might come on handy when debugging, profiling or otherwise just inspecting the structure of the index. The function takes the name of the R-tree index as an argument and returns a table with the following columns:

Column name Type Description
level INTEGER The level of the node in the R-tree. The root node has level 0
bounds BOX_2DF The bounding box of the node
row_id ROW_TYPE If this is a leaf node, the rowid of the row in the table, otherwise NULL

Example:

-- Create a table with 64 random points
CREATE TABLE t1 AS SELECT point::GEOMETRY AS geom
FROM st_generatepoints({min_x: 0, min_y: 0, max_x: 100, max_y: 100}::BOX_2D, 64, 1337);

-- Create an R-tree index on the geometry column (with a low max_node_capacity for demonstration purposes)
CREATE INDEX my_idx ON t1 USING RTREE (geom) WITH (max_node_capacity = 4);

-- Inspect the R-tree index. Notice how the area of the bounding boxes of the branch nodes 
-- decreases as we go deeper into the tree.
SELECT 
  level, 
  bounds::GEOMETRY AS geom, 
  CASE WHEN row_id IS NULL THEN st_area(geom) ELSE NULL END AS area, 
  row_id, 
  CASE WHEN row_id IS NULL THEN 'branch' ELSE 'leaf' END AS kind 
FROM rtree_index_dump('my_idx') 
ORDER BY area DESC;
┌───────┬──────────────────────────────┬────────────────────┬────────┬─────────┐
│ level │             geom             │        area        │ row_id │  kind   │
│ int32 │           geometry           │       double       │ int64  │ varchar │
├───────┼──────────────────────────────┼────────────────────┼────────┼─────────┤
│     0 │ POLYGON ((2.17285037040710…  │  3286.396482226409 │        │ branch  │
│     0 │ POLYGON ((6.00962591171264…  │  3193.725100864862 │        │ branch  │
│     0 │ POLYGON ((0.74995160102844…  │  3099.921458393704 │        │ branch  │
│     0 │ POLYGON ((14.6168870925903…  │ 2322.2760491675654 │        │ branch  │
│     1 │ POLYGON ((2.17285037040710…  │  604.1520104388514 │        │ branch  │
│     1 │ POLYGON ((26.6022186279296…  │  569.1665467030252 │        │ branch  │
│     1 │ POLYGON ((35.7942314147949…  │ 435.24662436250037 │        │ branch  │
│     1 │ POLYGON ((62.2643051147460…  │ 396.39027683023596 │        │ branch  │
│     1 │ POLYGON ((59.5225715637207…  │ 386.09153403820187 │        │ branch  │
│     1 │ POLYGON ((82.3060836791992…  │ 369.15115640929434 │        │ branch  │
│     · │              ·               │          ·         │      · │  ·      │
│     · │              ·               │          ·         │      · │  ·      │
│     · │              ·               │          ·         │      · │  ·      │
│     2 │ POLYGON ((20.5411434173584…  │                    │     35 │ leaf    │
│     2 │ POLYGON ((14.6168870925903…  │                    │     36 │ leaf    │
│     2 │ POLYGON ((43.7271652221679…  │                    │     39 │ leaf    │
│     2 │ POLYGON ((53.4629211425781…  │                    │     44 │ leaf    │
│     2 │ POLYGON ((26.6022186279296…  │                    │     62 │ leaf    │
│     2 │ POLYGON ((53.1732063293457…  │                    │     63 │ leaf    │
│     2 │ POLYGON ((78.1427154541015…  │                    │     10 │ leaf    │
│     2 │ POLYGON ((75.1728591918945…  │                    │     15 │ leaf    │
│     2 │ POLYGON ((62.2643051147460…  │                    │     42 │ leaf    │
│     2 │ POLYGON ((80.5032577514648…  │                    │     49 │ leaf    │
├───────┴──────────────────────────────┴────────────────────┴────────┴─────────┤
│ 84 rows (20 shown)                                                 5 columns │
└──────────────────────────────────────────────────────────────────────────────┘

layout: docu title: TPC-H Extension github_directory: https://github.com/duckdb/duckdb/tree/main/extension/tpch

The tpch extension implements the data generator and queries for the TPC-H benchmark.

Installing and Loading

The tpch extension is shipped by default in some DuckDB builds, otherwise it will be transparently [autoloaded]({% link docs/extensions/overview.md %}#autoloading-extensions) on first use. If you would like to install and load it manually, run:

INSTALL tpch;
LOAD tpch;

Usage

Generating Data

To generate data for scale factor 1, use:

CALL dbgen(sf = 1);

Calling dbgen does not clean up existing TPC-H tables. To clean up existing tables, use DROP TABLE before running dbgen:

DROP TABLE IF EXISTS customer;
DROP TABLE IF EXISTS lineitem;
DROP TABLE IF EXISTS nation;
DROP TABLE IF EXISTS orders;
DROP TABLE IF EXISTS part;
DROP TABLE IF EXISTS partsupp;
DROP TABLE IF EXISTS region;
DROP TABLE IF EXISTS supplier;

Running a Query

To run a query, e.g., query 4, use:

PRAGMA tpch(4);
o_orderpriority order_count
1-URGENT 10594
2-HIGH 10476
3-MEDIUM 10410
4-NOT SPECIFIED 10556
5-LOW 10487

Listing Queries

To list all 22 queries, run:

FROM tpch_queries();

This function returns a table with columns query_nr and query.

Listing Expected Answers

To produced the expected results for all queries on scale factors 0.01, 0.1, and 1, run:

FROM tpch_answers();

This function returns a table with columns query_nr, scale_factor, and answer.

Generating the Schema

It's possible to generate the schema of TPC-H without any data by setting the scale factor to 0:

CALL dbgen(sf = 0);

Data Generator Parameters

The data generator function dbgen has the following parameters:

Name Type Description
catalog VARCHAR Target catalog
children UINTEGER Number of partitions
overwrite BOOLEAN (Not used)
sf DOUBLE Scale factor
step UINTEGER Defines the partition to be generated, indexed from 0 to children - 1. Must be defined when the children arguments is defined
suffix VARCHAR Append the suffix to table names

Pre-Generated Data Sets

Pre-generated DuckDB databases for TPC-H are available for download:

Resource Usage of the Data Generator

Generating TPC-H data sets for large scale factors takes a significant amount of time. Additionally, when the generation is done in a single step, it requires a large amount of memory. The following table gives an estimate on the resources required to produce DuckDB database files containing the generated TPC-H data set using 128 threads.

Scale factor Database size Data generation time Generator's memory usage
100 26 GB 17 minutes 71 GB
300 78 GB 51 minutes 211 GB
1000 265 GB 2h 53 minutes 647 GB
3000 796 GB 8h 30 minutes 1799 GB

The numbers shown above were achieved by running the dbgen function in a single step, for example:

CALL dbgen(sf = 300);

If you have a limited amount of memory available, you can run the dbgen function in steps. For example, you may generate SF300 in 10 steps:

CALL dbgen(sf = 300, children = 10, step = 0);
CALL dbgen(sf = 300, children = 10, step = 1);
...
CALL dbgen(sf = 300, children = 10, step = 9);

Limitation

The tpch(⟨query_id⟩) function runs a fixed TPC-H query with pre-defined bind parameters (a.k.a. substitution parameters). It is not possible to change the query parameters using the tpch extension. To run the queries with the parameters prescribed by the TPC-H benchmark, use a TPC-H framework implementation.

this file is GENERATED, regenerate it with scripts/generate_python_docs.py

layout: docu title: Python Client API redirect_from:

  • /docs/api/python/reference/index
  • /docs/api/python/reference/index/

duckdb.threadsafety bool

Indicates that this package is threadsafe

duckdb.apilevel int

Indicates which Python DBAPI version this package implements

duckdb.paramstyle str

Indicates which parameter style duckdb supports

class duckdb.BinaryValue(object: Any)

Bases: Value

exception duckdb.BinderException

Bases: ProgrammingError

class duckdb.BitValue(object: Any)

Bases: Value

class duckdb.BlobValue(object: Any)

Bases: Value

class duckdb.BooleanValue(object: Any)

Bases: Value

duckdb.CaseExpression(condition: duckdb.duckdb.Expression, value: duckdb.duckdb.Expression) duckdb.duckdb.Expression
exception duckdb.CatalogException

Bases: ProgrammingError

duckdb.CoalesceOperator(*args) duckdb.duckdb.Expression
duckdb.ColumnExpression(*args) duckdb.duckdb.Expression

Create a column reference from the provided column name

exception duckdb.ConnectionException

Bases: OperationalError

duckdb.ConstantExpression(value: object) duckdb.duckdb.Expression

Create a constant expression from the provided value

exception duckdb.ConstraintException

Bases: IntegrityError

exception duckdb.ConversionException

Bases: DataError

exception duckdb.DataError

Bases: DatabaseError

class duckdb.DateValue(object: Any)

Bases: Value

class duckdb.DecimalValue(object: Any, width: int, scale: int)

Bases: Value

duckdb.DefaultExpression() duckdb.duckdb.Expression
class duckdb.DoubleValue(object: Any)

Bases: Value

class duckdb.DuckDBPyConnection

Bases: pybind11_object

append(self: duckdb.duckdb.DuckDBPyConnection, table_name: str, df: pandas.DataFrame, *, by_name: bool = False) duckdb.duckdb.DuckDBPyConnection

Append the passed DataFrame to the named table

array_type(self: duckdb.duckdb.DuckDBPyConnection, type: duckdb.duckdb.typing.DuckDBPyType, size: int) duckdb.duckdb.typing.DuckDBPyType

Create an array type object of ‘type’

arrow(self: duckdb.duckdb.DuckDBPyConnection, rows_per_batch: int = 1000000) pyarrow.lib.Table

Fetch a result as Arrow table following execute()

begin(self: duckdb.duckdb.DuckDBPyConnection) duckdb.duckdb.DuckDBPyConnection

Start a new transaction

checkpoint(self: duckdb.duckdb.DuckDBPyConnection) duckdb.duckdb.DuckDBPyConnection

Synchronizes data in the write-ahead log (WAL) to the database data file (no-op for in-memory connections)

close(self: duckdb.duckdb.DuckDBPyConnection) None

Close the connection

commit(self: duckdb.duckdb.DuckDBPyConnection) duckdb.duckdb.DuckDBPyConnection

Commit changes performed within a transaction

create_function(self: duckdb.duckdb.DuckDBPyConnection, name: str, function: Callable, parameters: object = None, return_type: duckdb.duckdb.typing.DuckDBPyType = None, *, type: duckdb.duckdb.functional.PythonUDFType = <PythonUDFType.NATIVE: 0>, null_handling: duckdb.duckdb.functional.FunctionNullHandling = <FunctionNullHandling.DEFAULT: 0>, exception_handling: duckdb.duckdb.PythonExceptionHandling = <PythonExceptionHandling.DEFAULT: 0>, side_effects: bool = False) duckdb.duckdb.DuckDBPyConnection

Create a DuckDB function out of the passing in Python function so it can be used in queries

cursor(self: duckdb.duckdb.DuckDBPyConnection) duckdb.duckdb.DuckDBPyConnection

Create a duplicate of the current connection

decimal_type(self: duckdb.duckdb.DuckDBPyConnection, width: int, scale: int) duckdb.duckdb.typing.DuckDBPyType

Create a decimal type with ‘width’ and ‘scale’

property description

Get result set attributes, mainly column names

df(self: duckdb.duckdb.DuckDBPyConnection, *, date_as_object: bool = False) pandas.DataFrame

Fetch a result as DataFrame following execute()

dtype(self: duckdb.duckdb.DuckDBPyConnection, type_str: str) duckdb.duckdb.typing.DuckDBPyType

Create a type object by parsing the ‘type_str’ string

duplicate(self: duckdb.duckdb.DuckDBPyConnection) duckdb.duckdb.DuckDBPyConnection

Create a duplicate of the current connection

enum_type(self: duckdb.duckdb.DuckDBPyConnection, name: str, type: duckdb.duckdb.typing.DuckDBPyType, values: list) duckdb.duckdb.typing.DuckDBPyType

Create an enum type of underlying ‘type’, consisting of the list of ‘values’

execute(self: duckdb.duckdb.DuckDBPyConnection, query: object, parameters: object = None) duckdb.duckdb.DuckDBPyConnection

Execute the given SQL query, optionally using prepared statements with parameters set

executemany(self: duckdb.duckdb.DuckDBPyConnection, query: object, parameters: object = None) duckdb.duckdb.DuckDBPyConnection

Execute the given prepared statement multiple times using the list of parameter sets in parameters

extract_statements(self: duckdb.duckdb.DuckDBPyConnection, query: str) list

Parse the query string and extract the Statement object(s) produced

fetch_arrow_table(self: duckdb.duckdb.DuckDBPyConnection, rows_per_batch: int = 1000000) pyarrow.lib.Table

Fetch a result as Arrow table following execute()

fetch_df(self: duckdb.duckdb.DuckDBPyConnection, *, date_as_object: bool = False) pandas.DataFrame

Fetch a result as DataFrame following execute()

fetch_df_chunk(self: duckdb.duckdb.DuckDBPyConnection, vectors_per_chunk: int = 1, *, date_as_object: bool = False) pandas.DataFrame

Fetch a chunk of the result as DataFrame following execute()

fetch_record_batch(self: duckdb.duckdb.DuckDBPyConnection, rows_per_batch: int = 1000000) pyarrow.lib.RecordBatchReader

Fetch an Arrow RecordBatchReader following execute()

fetchall(self: duckdb.duckdb.DuckDBPyConnection) list

Fetch all rows from a result following execute

fetchdf(self: duckdb.duckdb.DuckDBPyConnection, *, date_as_object: bool = False) pandas.DataFrame

Fetch a result as DataFrame following execute()

fetchmany(self: duckdb.duckdb.DuckDBPyConnection, size: int = 1) list

Fetch the next set of rows from a result following execute

fetchnumpy(self: duckdb.duckdb.DuckDBPyConnection) dict

Fetch a result as list of NumPy arrays following execute

fetchone(self: duckdb.duckdb.DuckDBPyConnection) Optional[tuple]

Fetch a single row from a result following execute

filesystem_is_registered(self: duckdb.duckdb.DuckDBPyConnection, name: str) bool

Check if a filesystem with the provided name is currently registered

from_arrow(self: duckdb.duckdb.DuckDBPyConnection, arrow_object: object) duckdb.duckdb.DuckDBPyRelation

Create a relation object from an Arrow object

from_csv_auto(self: duckdb.duckdb.DuckDBPyConnection, path_or_buffer: object, **kwargs) duckdb.duckdb.DuckDBPyRelation

Create a relation object from the CSV file in ‘name’

from_df(self: duckdb.duckdb.DuckDBPyConnection, df: pandas.DataFrame) duckdb.duckdb.DuckDBPyRelation

Create a relation object from the DataFrame in df

from_parquet(*args, **kwargs)

Overloaded function.

  1. from_parquet(self: duckdb.duckdb.DuckDBPyConnection, file_glob: str, binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -> duckdb.duckdb.DuckDBPyRelation

Create a relation object from the Parquet files in file_glob

  1. from_parquet(self: duckdb.duckdb.DuckDBPyConnection, file_globs: list[str], binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -> duckdb.duckdb.DuckDBPyRelation

Create a relation object from the Parquet files in file_globs

from_query(self: duckdb.duckdb.DuckDBPyConnection, query: object, *, alias: str = '', params: object = None) duckdb.duckdb.DuckDBPyRelation

Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is.

get_table_names(self: duckdb.duckdb.DuckDBPyConnection, query: str) set[str]

Extract the required table names from a query

install_extension(self: duckdb.duckdb.DuckDBPyConnection, extension: str, *, force_install: bool = False, repository: object = None, repository_url: object = None, version: object = None) None

Install an extension by name, with an optional version and/or repository to get the extension from

interrupt(self: duckdb.duckdb.DuckDBPyConnection) None

Interrupt pending operations

list_filesystems(self: duckdb.duckdb.DuckDBPyConnection) list

List registered filesystems, including builtin ones

list_type(self: duckdb.duckdb.DuckDBPyConnection, type: duckdb.duckdb.typing.DuckDBPyType) duckdb.duckdb.typing.DuckDBPyType

Create a list type object of ‘type’

load_extension(self: duckdb.duckdb.DuckDBPyConnection, extension: str) None

Load an installed extension

map_type(self: duckdb.duckdb.DuckDBPyConnection, key: duckdb.duckdb.typing.DuckDBPyType, value: duckdb.duckdb.typing.DuckDBPyType) duckdb.duckdb.typing.DuckDBPyType

Create a map type object from ‘key_type’ and ‘value_type’

pl(self: duckdb.duckdb.DuckDBPyConnection, rows_per_batch: int = 1000000) duckdb::PolarsDataFrame

Fetch a result as Polars DataFrame following execute()

query(self: duckdb.duckdb.DuckDBPyConnection, query: object, *, alias: str = '', params: object = None) duckdb.duckdb.DuckDBPyRelation

Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is.

read_csv(self: duckdb.duckdb.DuckDBPyConnection, path_or_buffer: object, **kwargs) duckdb.duckdb.DuckDBPyRelation

Create a relation object from the CSV file in ‘name’

read_json(self: duckdb.duckdb.DuckDBPyConnection, path_or_buffer: object, *, columns: Optional[object] = None, sample_size: Optional[object] = None, maximum_depth: Optional[object] = None, records: Optional[str] = None, format: Optional[str] = None, date_format: Optional[object] = None, timestamp_format: Optional[object] = None, compression: Optional[object] = None, maximum_object_size: Optional[object] = None, ignore_errors: Optional[object] = None, convert_strings_to_integers: Optional[object] = None, field_appearance_threshold: Optional[object] = None, map_inference_threshold: Optional[object] = None, maximum_sample_files: Optional[object] = None, filename: Optional[object] = None, hive_partitioning: Optional[object] = None, union_by_name: Optional[object] = None, hive_types: Optional[object] = None, hive_types_autocast: Optional[object] = None) duckdb.duckdb.DuckDBPyRelation

Create a relation object from the JSON file in ‘name’

read_parquet(*args, **kwargs)

Overloaded function.

  1. read_parquet(self: duckdb.duckdb.DuckDBPyConnection, file_glob: str, binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -> duckdb.duckdb.DuckDBPyRelation

Create a relation object from the Parquet files in file_glob

  1. read_parquet(self: duckdb.duckdb.DuckDBPyConnection, file_globs: list[str], binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -> duckdb.duckdb.DuckDBPyRelation

Create a relation object from the Parquet files in file_globs

register(self: duckdb.duckdb.DuckDBPyConnection, view_name: str, python_object: object) duckdb.duckdb.DuckDBPyConnection

Register the passed Python Object value for querying with a view

register_filesystem(self: duckdb.duckdb.DuckDBPyConnection, filesystem: fsspec.AbstractFileSystem) None

Register a fsspec compliant filesystem

remove_function(self: duckdb.duckdb.DuckDBPyConnection, name: str) duckdb.duckdb.DuckDBPyConnection

Remove a previously created function

rollback(self: duckdb.duckdb.DuckDBPyConnection) duckdb.duckdb.DuckDBPyConnection

Roll back changes performed within a transaction

row_type(self: duckdb.duckdb.DuckDBPyConnection, fields: object) duckdb.duckdb.typing.DuckDBPyType

Create a struct type object from ‘fields’

property rowcount

Get result set row count

sql(self: duckdb.duckdb.DuckDBPyConnection, query: object, *, alias: str = '', params: object = None) duckdb.duckdb.DuckDBPyRelation

Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is.

sqltype(self: duckdb.duckdb.DuckDBPyConnection, type_str: str) duckdb.duckdb.typing.DuckDBPyType

Create a type object by parsing the ‘type_str’ string

string_type(self: duckdb.duckdb.DuckDBPyConnection, collation: str = '') duckdb.duckdb.typing.DuckDBPyType

Create a string type with an optional collation

struct_type(self: duckdb.duckdb.DuckDBPyConnection, fields: object) duckdb.duckdb.typing.DuckDBPyType

Create a struct type object from ‘fields’

table(self: duckdb.duckdb.DuckDBPyConnection, table_name: str) duckdb.duckdb.DuckDBPyRelation

Create a relation object for the named table

table_function(self: duckdb.duckdb.DuckDBPyConnection, name: str, parameters: object = None) duckdb.duckdb.DuckDBPyRelation

Create a relation object from the named table function with given parameters

tf(self: duckdb.duckdb.DuckDBPyConnection) dict

Fetch a result as dict of TensorFlow Tensors following execute()

torch(self: duckdb.duckdb.DuckDBPyConnection) dict

Fetch a result as dict of PyTorch Tensors following execute()

type(self: duckdb.duckdb.DuckDBPyConnection, type_str: str) duckdb.duckdb.typing.DuckDBPyType

Create a type object by parsing the ‘type_str’ string

union_type(self: duckdb.duckdb.DuckDBPyConnection, members: object) duckdb.duckdb.typing.DuckDBPyType

Create a union type object from ‘members’

unregister(self: duckdb.duckdb.DuckDBPyConnection, view_name: str) duckdb.duckdb.DuckDBPyConnection

Unregister the view name

unregister_filesystem(self: duckdb.duckdb.DuckDBPyConnection, name: str) None

Unregister a filesystem

values(self: duckdb.duckdb.DuckDBPyConnection, *args) duckdb.duckdb.DuckDBPyRelation

Create a relation object from the passed values

view(self: duckdb.duckdb.DuckDBPyConnection, view_name: str) duckdb.duckdb.DuckDBPyRelation

Create a relation object for the named view

class duckdb.DuckDBPyRelation

Bases: pybind11_object

aggregate(self: duckdb.duckdb.DuckDBPyRelation, aggr_expr: object, group_expr: str = '') duckdb.duckdb.DuckDBPyRelation

Compute the aggregate aggr_expr by the optional groups group_expr on the relation

property alias

Get the name of the current alias

any_value(self: duckdb.duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') duckdb.duckdb.DuckDBPyRelation

Returns the first non-null value from a given column

apply(self: duckdb.duckdb.DuckDBPyRelation, function_name: str, function_aggr: str, group_expr: str = '', function_parameter: str = '', projected_columns: str = '') duckdb.duckdb.DuckDBPyRelation

Compute the function of a single column or a list of columns by the optional groups on the relation

arg_max(self: duckdb.duckdb.DuckDBPyRelation, arg_column: str, value_column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') duckdb.duckdb.DuckDBPyRelation

Finds the row with the maximum value for a value column and returns the value of that row for an argument column

arg_min(self: duckdb.duckdb.DuckDBPyRelation, arg_column: str, value_column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') duckdb.duckdb.DuckDBPyRelation

Finds the row with the minimum value for a value column and returns the value of that row for an argument column

arrow(self: duckdb.duckdb.DuckDBPyRelation, batch_size: int = 1000000) pyarrow.lib.Table

Execute and fetch all rows as an Arrow Table

avg(self: duckdb.duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') duckdb.duckdb.DuckDBPyRelation

Computes the average on a given column

bit_and(self: duckdb.duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') duckdb.duckdb.DuckDBPyRelation

Computes the bitwise AND of all bits present in a given column

bit_or(self:<
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment