6.824 2021 Lecture 20: Blockstack (2016)

topic today: decentralized apps
  apps built in a way that moves ownership of data into users's hands
    and out of centrally-controlled web sites
    a very different architecture for web sites, might someday be better
  there are many recent (and older) explorations of this general vision.
    early P2P apps, keybase, solid, etc
  the success (and properties) of Bitcoin has prompted a lot of recent activity
    a non-crypto-currency use of a blockchain
  Blockstack is a real system, developed by a company
    it does have some users and some apps written for it
    key challenge: naming
    
old: a typical (centralized) web site
  [user browsers, net, site's web servers w/ app code, site's DB]
  users' data hidden behind proprietary app code
  e.g. blog posts, gmail, piazza, reddit comments, photo sharing,
    calendar, medical records, &c
  this arrangement has been very successful
    it's easy to program
  why is this not ideal?
    users have to use this web site's UI if they want to see their data
    web site sets (and changes!) the rules for who gets access
    web site may snoop, sell information to advertisers
    web site's employees may snoop for personal reasons
    disappointing since it's often the user's own data!
  a design view of the problem:
    the big interface division is between users and app+data
    app+data integration is convenient for web site owner
    but HTML as an interface is UI-oriented.
      and is usually not good about giving users control and access to data

new: decentralized apps
  [user apps, general-purpose cloud storage service, naming/PKI]
  this architecture separates app code from user data
    the big interface division is between user+app and data
    so there's a clearer notion of a user's data, owned/controlled by user
    much as you own the data on your laptop, or in your Athena account
  requirements for the storage system
    in the cloud, so can be accessed from any device
    general-purpose, like a file system
    paid for and controlled by user who owns the data
    sharing between users, modulo permissions, for multi-user apps
    sharing between a user's apps, modulo permissions
    similar to existing services like Amazon S3

what's the point?
  easier for users to switch apps, since data not tied to apps (or web sites)
  easier to have apps that look at multiple kinds of data
    calendar/email, or backup, or file browser
  privacy vs snooping (assuming end-to-end encryption)

how might decentralized applications work?
  here's one simple possibility.
  app: a to-do list shared by two users
    [UI x2, check-box list, "add" button]
  both contribute items to be done
  both can mark an item as finished
  a public storage system,
    key/value data owned by each of U1 and U2
  users U1 and U2 run apps on their computers
    maybe as JavaScript in browsers
    the apps read other user's public data, write own user's data
  the app doesn't have any associated server, it just uses the storage system
  each user creates a file with to-do items
    and a file with "done" marks
  each user's UI code periodically scans the other user's to-do files
  the point:
    the service is storage, independent of any application.
    so users can switch apps, write their own, add encryption to
    prevent snooping, delete their to-do lists, back them up,
    integrate with e-mail app, &c

what could go wrong?
  decentralization is painful:
    per-user FS-like storage much less flexible than dedicated SQL DB
    no trusted server to e.g. look at auction bids w/o revealing
    cryptographic privacy/authentication makes everything else harder
    awkward for users as well as programmers
  current web site architecture works very well
    easy to program
    central control over software+data makes changes (and debugging) easy
    good solutions for performance, reliability
    easy to impose application-specific security
    successful revenue model (ads)

now for Blockstack
  
why does Blockstack focus on naming?
  names correspond to human users, e.g. "robertmorris"
  name -> location (in Gaia) of user's data, so multiple users can interact
  name -> public key, for end-to-end data security
    so I can check I've really retrieved your authentic data
    so I can encrypt my data so only you can decrypt it
    since storage system is not trusted
  lack of a good global PKI has been damaging to many otherwise good security ideas
    so Blockstack started with names

Blockstack claims naming is hard, summarized by "Zooko's triangle":
  1. unique (global) i.e. each name has the same meaning to everyone
  2. human-readable
  3. decentralized
  claim: all three would be valuable (debatable...)
  claim: any two is easy; all three is hard

example for each pair of properties?
  unique + human-readable : e-mail addresses
  unique + decentralized : randomly chosen public keys
  human-readable + decentralized : my contact list

why is all three hard?
  can we add the missing property to any of our three schemes?
  no, all seem to be immediate dead ends

summary of how Blockstack gets all three?
  Bitcoin produces an ordered chain of blocks
  Blockstack embeds name-claiming records in Bitcoin blocks
  if my record claiming "rtm" is first in Bitcoin chain, I own it
  unique (== globally the same)?
  human-readable?
  decentralized?

is this kind of name space good for decentralized apps?
  is unique (== global) valuable?
    yes: I may be able to remember names I already know.
    yes: I can give you a name, and you can use it.
    yes: I can look at an ACL and guess what it means.
    no: human-readable names aren't likely to be very meaningful if chosen from global pool
        e.g. robert_morris_1779 -- is that me? or someone else?
        how about "rtm@mit.edu"?
    no: how can I find your Blockname name?
        how can I verify that a Blockstack name is really you?
  other (possibly bad) ideas:
    only public keys, don't bother with human-readable names
      each person keeps separate "contact list" with names they understand
      naturally decentralized
      not "unique" thus no need for Bitcoin
    central entity that reliably verifies human identity

challenges: leverage existing blockchain
  Starting a new block chain is hard
    need to be big to deal with byzantine participants
  Bitcoin blockchain
    Limits on data in record
    Slow writes
    Block size
    Ledger

what are all the pieces in Blockstack?
  client, browser, application, blockstack.js
  Blockstack Browser (meant to run on client machine)
  Bitcoin's block-chain
  Blockstack servers
    read Bitcoin chain
    interpret Blockstack naming records to update DB
    serve naming RPCs from clients
    name -> pub key + zone hash
  Atlas servers -- store "zone records"
    a name record in bitcoin maps to a zone record in Atlas
    zone record indicates where my Gaia data is stored
    keyed by content-hash, so items are immutable
    you can view Atlas as just reducing the size of Blockstack's Bitcoin transactions
    Atlas keeps the full DB in every server
  Gaia servers
    separate storage area for each user (i.e. end-users)
    key -> value
    backed by Amazon S3, Dropbox, &c
      Gaia makes them all look the same
    most users use Gaia storage provided by Blockstack
    user's profile contains user's public key, per-app public keys
    user can have lots of other files, containing app data
    apps can sign and/or encrypt data in Gaia
  S3, Dropbox, &c
    back-ends for Gaia

NAME CREATION

how does one register a Blockstack name?
  (https://docs.blockstack.org/core/wire-format.html)
  the user does it (by running Blockstack software)
  user must own some bitcoin
  two bitcoin transactions: preorder, registration
  preorder transaction
    registration fee to "burn" address
    hash(name)
  registration transaction
    name (not hashed)
    owner public key
    hash(zonefile)
  Blockstack info hidden inside the transactions, Bitcoin doesn't look at it
    but Bitcoin signatures/hashes cover this Blockstack info

why *two* transactions?
  front-running

why the registration fee? after all there's no real cost.

what if a client tries to register a name that's already taken?

what if two clients try to register same name at same time?

is it possible for an attacker to change a name->key binding?
  after all, anyone can submit any bitcoin transaction they like

is it possible for Blockstack to change a name->key binding?

STORAGE

how does the client know where to fetch data from?
  starting with owning user's name, and a key
  apps probably use well known keys, e.g. "profile" or "todo-list"
  bitcoin/blockstack, hash(zone), gaia address

how does the client check that it got the right data back from Gaia?

how does the client know data from Gaia is fresh (the latest version)?
  owner signed the data when writing
  where can others get the owner's public key, to check signature?

how does Gaia know whether to let a client write/change/delete?

what about encryption for privacy?
  if only the owner should see the data?
  if one other user should see the data, in addition to the owner?
  if just 6.824 students should see the data?

PRIVATE KEYS

never leaves user's device(s)
  so you don't have to trust anything other than your device and Blockstack's software
  each of your devices has a copy of your master private key

"master" private key only seen by Blockstack Browser
  too sensitive to let apps see or use it
  protected by pass-phrase, then in clear while user is active

Blockstack Browser hands out per-app private keys
  so each app has more or less separate encrypted storage
  makes it hard for one user's different apps to cooperate
    sometimes that's what you want
    sometimes you do want sharing among your own apps

DISCUSSION

here are some questions to chew on.
  about naming
  about decentralized applications
you can view them as criticism.
or as areas for further development.

Q: could blockstack be used as a PKI for e-mail, to map rtm@mit.edu to my public key?
   blockstack names vs e-mail addresses?
   what does a blockstack name mean?

Q: why is PKI hard in general?
   lost pass-phrases and keys
     recovery (mother's maiden name? SMS? e-mail?)
   what does a name mean? connection to "real" identity?
   how to go from intuitive notion of who I want to talk to, to name?
   some progress, e.g. Keybase

Q: for naming and PKI, is there strong value in decentralization?
   can we have a centralized but secure naming system?
     who can we all trust for a global-scale system?
   indeed what value can a central authority realistically deliver?
   would adoption be easier with decentralization?

Q: could blockstack use a scheme like Certificate Transparency instead of Bitcoin?
   CT can't resolve conflicts, only reveal them.
     different CT logs may have different order
       so CT can't say which came first
     it's Bitcoin mining that resolves forks and forces agreement
   the fee aspect of Blockstack seems critical vs spam &c, relies on cryptocurrency
   in general, open block-chains only seem to make sense w/ cryptocurrency

Q: is Blockstack convenient for programmers?
   all code in client, no special servers
     hard to have data that's specific to the app, vs each user
     indices, vote counts, front-page rankings for Reddit or Hacker News
   SQL queries
   cryptographic access control, groups, revocation, &c
   hard to both look at other users' secrets, and keep the secrets
     e.g. for eBay
   maybe only worthwhile if users are enthusiastic...

Q: is decentralized user-owned storage good for user privacy?
   is it better than trusting Facebook/Google/&c web sites to keep data private?
     vs other users, hackers, their own employees?
   can Blockstack storage providers watch what you access?
   what if app, on your computer, snoops on you?
     after all, it's presumably still Facebook or whoever writing the app.
   is cryptographic access control really feasible?
   you still have to trust the provider to preserve your data
     and to serve up the most recent version
     if you trust them that much, why not trust them to keep it secret too?

Q: is decentralized user-owned storage good for user control?
   do users want to switch applications a lot for the same data?
   do users want to use same data in multiple applications?
   does either even work in general, given different app formats?

Q: will users be willing to pay for their own Gaia storage?

CONCLUSION

what do I take away from Blockstack?
  I find the overall decentralization vision attractive.
  the whole thing rests on a PKI -- any progress here would be great
    a general-purpose mapping from all users to their public keys would be very useful
  surprising that we can have decentralized human-readable name allocation
    but unclear whether decentralized human-readable names are a good idea
  separating cloud data from applications sounds like a good idea
    but developers will hate it (e.g. no SQL).
    not clear users will know or care.
    not clear whether users will want to pay for storage.
  end-to-end encryption for privacy sound like a good idea
    private key management is a pain, and fragile
    encryption makes sharing and access control very awkward
  you still have to trust vendor software; not clear it's a
    huge win that it's running on your laptop rather than
    vendor's server.

all that said, it would be fantastic if Blockstack or something
  like it were to be successful.

--- references ---

https://dataspace.princeton.edu/handle/88435/dsp019306t191k
https://econinfosec.org/archive/weis2015/papers/WEIS_2015_kalodner.pdf