Designing a Peer-to-Peer Decentralized Database
Let’s say you wanted to build a decentralized Facebook, Github, Gather.town, or other complex application as a protocol. There are a few general approaches you could take:
- You could develop a protocol from scratch. Projects like Bluesky, Nostr, and Farcaster did this over several years.
- You could develop a regular web application, that pins data on a peer-to-peer network like IPFS. But those networks don’t have tables or indexes, so any global table of users or activity would still live on a centralized server somewhere.
- You could put everything on a blockchain. Many Web3 projects that started with IPFS ended up doing this, saving their data on a blockchain like Arweave or storage network like Filecoin.
A more general solution would be to use a decentralized database, where every user’s data is cryptographically signed, indexed, and organized.
Users and community hosts could sync just the data they were interested in. Someone using the application on one node would be able to see their data appear across frontends. And ideally, using this database would be just like using Firebase or Postgres.
Projects like OrbitDB, Gun, and Hypercore have been working on this problem over the last 6+ years. So why isn’t there a widely used database that does this yet? We think there are a few reasons:
Decentralized identity is still emerging. The web never had a place for users to store public/private keypairs, so decentralized databases have never been able to verify the provenance of data from a user. This meant that traditionally, decentralized databases were either local-first or hosted databases.
But now in 2023, mobile phones and laptops come with Passkeys and Secure Enclaves, with good web APIs. Tens of millions of users have blockchain wallets, while projects like Bluesky are creating their own public key infrastructures. Over the next few years, it will become likelier than not that a regular user will have access to a secure cryptographic identity, without having to think about it.
Access control breaks decentralization. Let’s say you run a blogging platform, and you give a user permission to create posts once they’ve logged in. What if a user generates a thousand identities and pushes data until your disk is full? Should an account have the authority to delete them? Or should deletions be federated, so each host can select peers whose deletions they trust?
Early decentralized databases largely sidestepped this concern, by allowing anyone to write to a table, and/or segregating every user’s data into their own append-only logs. But a practical peer-to-peer database has to provide a more natural way of handling access control.
Over the last year, we’ve been iterating on a system that solves these problems in a more elegant way.
Main Idea #1: Declarative Contracts
Traditional databases are defined in terms of tables, but what happens different clients write conflicting data to the same table, or when an application gets upgraded? Centralized applications rely on the controller logic of the application being managed by the operator, which serves as a natural point of access control, but no such layer exists in the decentralized world.
Direct access to tables breaks decentralization, so decentralized databases need to rely on a higher level of abstraction: embedding controller logic in the database.
We’ve implemented this in what we call “offchain contracts”1, modeled after smart contracts but adapted to an environment without blockchains. A contract declares models (database tables), views (query functions), and actions, which are handler functions that accept signed user interactions, verify them, and execute writes to the database.
export const models = {
posts: {
id: "primary",
text: "string",
from: "string",
updatedAt: "datetime",
}
}
export const actions = {
createPost: ({ text }, { db, hash, from }) => {
db.posts.set(hash, { text, from })
}
}
$ canvas run contract.js
Launching contract with CID bafkreiaqacyyw5ztn5blm537znho4d63gg2llnmy627bl5kghgtfryot4m...
One benefit is immutability, or hardness: we now have a consistent definition of what’s an acceptable user interaction for an application, which allows the database to run autonomously without maintenance.
Another way to think about this is model-action separation. There are often subtle differences between data received in controllers and written to the database. It’s useful to separate the two layers, so you can change one without affecting the other.
Main Idea #2: Upgrades are Rebases
Traditionally, database migrations happen outside of the database, where they can have indeterminate effects.
In our immutable contract model, they’re embedded in the database instead.
To create a new version of an application, a developer soft-forks an old contract into a new one. The history of actions is replayed, and incompatible actions are dropped. This is essentially programmatic type of rebasing, where a long action history is regenerated to remove stale actions.
Here’s a slightly more complex version of our first application, which adds a check for a Gitcoin Passport score, to help deal with spam and quality-of-service issues:
export const models = {
posts: {
id: "primary",
text: "string",
from: "string",
updatedAt: "datetime",
}
}
export const actions = {
createPost: ({ text }, { db, hash, from }) => {
const score = await passport.getScore(from);
if (score < 40) {
throw new Error("Passport score of 40 required");
}
db.set(hash, { text, from })
}
}
export const sources = {
'bafkreiaqacyyw5ztn5blm537znho4d63gg2llnmy627bl5kghgtfryot4m': { ...actions }
}
Main Idea #3: Extremely Customizable Validation
We mentioned earlier that first-generation decentralized databases had inconsistent support for decentralized identity. And so, many projects ended up rolling their own public-key systems, or building centralized services to support user logins.
Today, the decentralized identity space is still messy. There are still countless different standards from different cryptographic key signing formats, and many protocols use different keys in different places.2 There are multiple standards for signing and encoding/decoding data across basically every ecosystem3.
Even after you’ve validated a signature, you have to extract other data, like the timestamp of an action, or the relationship between a login session and an action taken under it. Signatures are often chained, and there are many standards for doing this, like UCAN, OCaps, CACAOs, and Delegatables.
We explored many different ways to decode and validate signatures. Ultimately, we concluded there was only one approach that would work universally – providing an executable environment that can validate any signature scheme.
This fits conveniently inside of contract functions, and allowing them to handle custom encodings (e.g. JSON, CBOR, Protobufs) and cryptographies (e.g. WebCrypto, Ethereum). Here’s an example:
export const actions = {
message: {
topic: 'canvas:myapp:{partition}',
schema: { /* do decoding here */ },
apply: (msg, { db }, [localVars]) => {
const { message, signature, senderAddress } = msg
if (!nacl.sign.verify(message, signature, senderAddress)) {
throw new Error("invalid signature")
}
/* continue handling the message */
},
create: (args, { db }, [localVars]) => { /* create a new message */ },
}
}
That’s a fully featured custom action!
- topic specifies where we listen for new actions, which is libp2p here.
- schema specifies a validation schema for decoding the action.
- apply is the handler, which receives the message and verifies it. If schema isn’t provided, the action comes in as raw bytes.
- create is a local helper function that can be used to implement encrypted data.
Note that we now pass a special localVars array to the action handlers. You might use it to pass a locally held private key into the action – this allows us to write encrypted apps, where data is decrypted before it’s even written to the database.
This works in tandem with the create helper function, where data can be encrypted using the same private key as an action is constructed. This means applications can have transparently encrypted tables!
Main Idea #4: Adding a Content-Addressable Store
Custom verification builds heavily on the executable contract environment we described earlier. But to really make it work, we have to add the concept of mutable and immutable tables.
Canvas is a peer-to-peer database, which means that actions might be received in any order. That means that traditional database reads don’t work. So how do we give developers back some way to read from the database?
Enter immutable tables, and the content-addressable store. Each database table specified in models can be set as mutable or immutable.
Every row stored in an immutable database table is hashed, and referenceable by its hash. Then, later actions can fetch it by calling db.get(hash).
If someone receives an action that tries to read a missing row from the database, then it will know that previous actions are missing, and it can ask for this missing data over the network.
In essence, there are now two halves to the database. One half is a content-addressable store like IPFS, while the other half is a conventional key-value store.
A Complex Application
Here’s a snippet from a longer contract, which implements end-to-end encrypted messaging.
export const models = { /* ... */ }
export const actions = {
registerUser: { /* ... */ },
registerRoom: { /* ... */ },
message: {
topic: 'canvas:e2e-messaging:rooms:{room}',
schema: {
'@message': {
'@encryptedMessage': { /* ... */ },
'@userRegistration': { /* ... */ },
roomId: 'bytes',
senderAddress: 'bytes',
messageHash: 'bytes',
encryptedMessages: '@encryptedMessage[]',
recipients: '@userRegistration[]',
}
message: '@message',
signature: 'bytes',
},
apply: (payload, [signer, privateKey]) => {
const { message, signature } = payload
if (!nacl.sign.verify(payload['@message'], signature, message.senderAddress)) {
throw new Error("invalid signature")
}
const { roomId, senderAddress, messageHash, encryptedMessages, recipients } = message
// Check sender address
assert(senderAddress === recipients.find((userReg) => userReg.address), "Invalid signature")
// Check key bundles
const keyBundles = recipients.map((userRegistration) => {
const { signature, address } = userRegistration
const recoveredAddress = ethers.utils.recoverAddress(
userRegistration["@keyBundle"], signature
)
assert(recoveredAddress === address, "Invalid signature")
return userRegistration.keyBundle
})
// Decryption
if (privateKey) {
const decryptionWallet = new ethers.Wallet(privateKey)
const decryptionPublicKey = await decryptionWallet.getEncryptionPublicKey()
const myMessage = encryptedMessages.find(({ publicKey, cipherText }) => {
return publicKey === decryptionPublicKey
})
assert(myMessage, "Invalid message recipient")
const decrypted = await decryptionWallet.decrypt(myMessage.ciphertext)
assert(messageHash === hash(decrypted), "Invalid message hash")
db.messages.set({
room: roomId,
senderAddress,
message: decrypted.message
timestamp: decrypted.timestamp
})
}
},
create: ({ roomId, message, timestamp, recipientUserRegistrations }, [signer, privateKey]) => {
// ...
}
}
}
This actually hasn’t been optimized yet; a later version would factor out different dependencies of messages using immutable tables.
Messages go in rooms, which are dynamically partitioned libp2p topics each stored in their own table. Beneath the hood, the contract uses CBOR, ed25519, and a stream cipher provided with the default Ethereum libraries.4
We wrote the same application from scratch using libp2p and IndexedDB/SQLite, and it was 1500 lines of code over 15+ files for just the backend. In a contract, it’s 200 lines of code.
What’s Next
We’ve written a forum, an encrypted messaging application, and a survey tool on this platform, covering a broad set of features. They include support for half a dozen cryptographic key formats, end-to-end encryption, and running nodes in the browser vs. from the command line.
The examples show that decentralized databases can be useful in many places - in a browser tab, in a local-first desktop application, on a hosted service, inside a blockchain framework like MUD, or more.
There are a few things we’re focusing on next. One is getting real-world software running on this database. Another is simplifying the language so that it’s approachable for new users. Finally, we want to start pushing on the boundaries of what can be done with the peer-to-peer layer, adding more sophisticated CRDTs and transports like Signal, Waku, and state channels.
As we do this, we’d love to talk to developers, application deployers, and potential partners about working together.
And over the next year, we want to improve this framework to a point where every developer can build a performant, decentralized web application that works as well as a traditional one, that has all the scaling and sovereignty advantages of a protocol.
A few other decentralized runtimes have contract-like implementations, but their approaches are usually pretty different, typically using WASM and focusing on high scalability. ↩︎
Most protocols use a combination of secp256k1, secp256r1, and ed25519 signatures, with different encoding and decoding schemes, although there are newer key systems and zero-knowledge proofs used as signatures too. Codecs and signature formats are usually a bigger issue than the exact cryptography used. ↩︎
In addition to eth_sign, personal_sign, and signTypedData, there are now methods like SIWE and EIP-1271 for account abstraction signatures. ↩︎
Note that this doesn’t use a key agreement protocol to establish unique signing keys for each channel, nor a ratchet for forward secrecy; these are left for future versions of the protocol. ↩︎