Introduction To DrummerDB

Overview

DrummerDB is a proof of concept database system in very early stages. It is modeled after traditional relational SQL database systems such as Postgres and Microsoft SQL Server.

It's primary purpose is to try and enable users of a database system to "own" their data. It attempts to achieve this by allowing for the rows of a table in a database system to be stored elsewhere from the main database host; usually by another instance of DrummerDB under "control" by the respective users. It determines ownership by use of a "data contract", which is a software defined item similiar in implementation to an ACL (Access Control List) but expanded to include definitions for resolving referential integrity, location of data, and other items.

Disclaimer

Data ownership is a tricky concept. Once someone has read data, all bets are usually off about maintaining secrecy and authority over it.

The real world implications of this is that unless someone can be assured of all the touch points of data - from capture, transport, and storage, it is very hard to be assured that a user has full authority over their data. I want to acknowledge this up front.

Concepts

There are a few concepts that DrummerDB tries to model. They revolve around data ownership and sharing of data, described as a "cooperative data model."

To "own" your data in DrummerDB means that you have full authority of the data that pertains to you in the database system. This means the ability to control who has read access, when they have it, and the ability to revoke it. It also means that you have the ability to change your data or remove it entirely, and that you have the ability to determine where it will be stored.

To own your data in DrummerDB also means that your data should be portable - regardless of the application tier that sits above it.

Implementation

Walkthrough

Consider a simplified (not 3rd normal form) database schema for an online retailer database:

Example Schema

In the previous diagram, there is a customers table, where the customer information is stored, such as the first name, last name, shipping address, and so on. The origin of this data is the user who is on the online retailer's website. They are the owners of their own information.

There is a product table, which contains various products that the retailer will be selling. It's origin is the online retailer itself.

There is also a table where the two data points are combined when a customer places an order. This data is owned by both the retailer and the customer.

Host and Partial Databases

DrummerDB tries to account for ownership by allowing the schema for a database to be split between different database instances.

Partial Database Example

In the previous diagram, rows in the Customer table are not actually persisted in the retailer's customer table. Instead, references and hashes of the data are stored to the correct database instance that contains the actual information for each row. The customer's instance of the retailer database can be anywhere a DrummerDB database can be brought online: within the same instance, or on a hosted instance under the actual customer's control, or any other implementation in-between.

The customer's instance of the retailer's database is called a partial database. For most purposes, it behaves exactly like any other RMDBS database, except that it only contains tables that pertain to the user or data tables that are shared between the customer and the retailer. In DrummerDB, the customer is known as a participant of the database system.

The retailer's instance of the database is called a host database. It contains data that pertains to the retailer, or is shared by the retailer and the customer.

Data Contracts

To determine where data will be saved, who has access to it, and how changes and references to data are reconciled, before a customer can instantiate a partial database, the retailer defines a "data contract."

A data contract contains:

ItemDescription
Database SchemaA copy of the full database schema (tables, columns, primary and foreign keys)
Logical Storage PoliciesEach table in the database has a logical storage policy which identifies where the data will be stored: either at the host, or the customer, or replicated, or some other implementation.
AuthenticationFor CRUD (Create, Read, Update, Delete) actions, the appropriate authorizations are declared.
Initial PermissionsFor each table, CRUD operations are defined for what actions are granted to the host and the participant. For schema objects at the participant, however, these values can be changed.

Once a database host has authored a database contract, it is sent to the participant's instance for review to be accepted or rejected. Any schema changes on the host's side of the database will require a new database contract to be generated, and potentially, accepted by participants of the database.

Since a copy of the entire database schema is sent with the contract, the participant has an understanding of how their data fits in with the rest of the database system. This is, generally speaking, how DrummerDB tries to allow for data portability. While the database schema may be intended for the online retailer's application and because the participant has a copy of the entire database schema, it allows for the participant to use their data elsewhere, perhaps in their own written reports on their spending, etc.

Caveats

There are some considerations that have to be made when implementing DrummerDB, including:

Logical Integrity

  • Suppose there is a schema where a Customer is able to make modifications to data that could potentially defraud the retailer. How can the retailer know if data has changed, without actually having the data values on their side? DrummerDB tries to account for this by hashing the data on both sides. If data changes occur on either side of a database, differences can be detected by comparing hashes.

Referential Integrity

  • Suppose in the previous example that the Customer decides to outright delete their information from their partial database. In this case, the retailer still has a reference to a row that no longer exists. DrummerDB allows for this situation to happen, and will error out accordingly, and depending on contract definitions, will manifest these changes back to the host database.

Transactions and Latency

  • Ensuring that DrummerDB can be fully ACID compliant is still not determined. Depending on where the partial database is instantiated, latency in the database system will be a major issue. It must be acknowledged that network communications can be unreliable.

Similar Projects

This concept of data ownership and data portability is not unique. There are other projects that are attempting to accomplish similar things:

Project Caveats

DrummerDB is being written in C#, chosen for my familiarity with the language. I don't consider myself to be a seasoned C# developer, nor do I assert that I fully understand database internals.