Pychain: Blockchain in 200 lines of Python

Since in the introduction of Bitcoin and the explosion of crypto currencies has created quite a lot of attention around the concept of Blockchain. In this post I will try to explain the concept by implementing Pychain, a toy Blockchain, in about 200 lines of Python.

What is Blockchain?

Wikipedia defines Blockchain as

A blockchain, originally block chain, is a growing list of records, called blocks, which are linked using cryptography.

If we simplify this a little further for easier understanding, we can define Blockchain as a Linked List of Blocks. In a traditional Linked List each node points to its succeeding node. However in Blockchain, each Block points to its preceding Block.

Just another Block in the chain

Continuing with our Linked list analogy, each block each block must contain meta data to validate the integrity of the chain. In Pychain we include on the most necessary: index, nonce, timestamp, data, previous hash, and hash.

block-info

Most of the attributes are self explanatory, except of nonce. We will discuss about a little bit later.

Hash

Each block being added into the chain is validated by calculating the hash. SHA-256 in Pychain. The way hash is calculated is take all the fields (index, nonce, timestamp, data, and previous hash) and calculate SHA-256 sum.

The process of calculating the hash is defined as “Mining” in bitcoin and other crypto currencies. If it were this simple, then validity of the chain can compromised very easily. To make it difficult to compromise each miner has to spend some time and money “working”. This is called “Proof of Work”.

Proof of Work

SHA-256 is always 256 bits or 64 bytes long no matter how long the input is. For example hash for text, “Hello, Pychain”, it is 368fd26488e3fda58ec81fc5bdc0ea6b919526f765cfddc315fb1d81b9793649.


$ echo Hello, Pychain | sha256sum
368fd26488e3fda58ec81fc5bdc0ea6b919526f765cfddc315fb1d81b9793649 -

Computing this hash hardly takes any effort. To increase the effort needed to mine a block, Bitcoin arbitrarily choose certain number of digits in the hash to be zeros. At the time of writing this post Bitcoin expects the first 18 digits of the hash to be zeros.  For a given input, SHA-256 will always compute the same hash. This is where the field “nonce” defined in our block comes to picture.

mine-algo
Simple algorithm for Proof of Work

For our Blockchain, we will set the default “difficulty” to 4, but while running this value can be set to whatever we want.

Adding a new Block

Now that we have some basic infrastructure in place we can start adding blocks to Pychain. To mine a new block we need a information about the previous block and we compute the rest of the fields, index, time stamp. Data is provided by the end user.

Genesis Block

If we need the previous block to add a new block, how do we kick start the chain? This chicken and egg problem is solved by what is called the “Genesis Block”. This is a special kind of block with previous hash set to all zeros. Pychain generates this block like this.

In a real world Blockchain, genesis block is computed only once.

Storing the Blocks

As the number of blocks grow, we need a way to persist the historical block. Pychain does not have persistence, we will store them as a in-memory array.

Authenticity of a Block

So far Pychain has been single node Blockchain. Although it is a valid chain, it does not serve much of a purpose. When we have multiple peer nodes in the network we need to way to validate if the new Block being added to the chain is authentic. Each node upon receiving a new Block will try to validate the block. Only valid Blocks will be added to the chain.

Longest chain wins

Real world Blockchain implementations will trust the chain with the maximum work done, or in other words the longest chain always wins. This way as the chain grows, it becomes harder and harder to manipulate the chain. For an attacker to take over the blockchain has to mine all the blocks since the “fraudulent” block and be faster than 51% of all the nodes in doing so. For the sake of simplicity, Pychain only verifies if the index of the new Block is greater than the previous known block and the new block passes the validation.

Peer to Peer Protocol

All Blockchain implementations have a protocol to communicate between peers. Pychain has a bare bones protocol to achieve this. Websockets are used to communicate with other peers in the network. Pychain supports only two “commands”, QUERY_ALL and NEW_BLOCK.

Whenever a new node joins the network, it will try to ask one of its peers for the current state of the Blockchain using the QUERY_ALL command. If a new block is mined by this node it will send out a broadcast message to all the peers it is aware of with a NEW_BLOCK command. When a node receives a NEW_BLOCK message it will follow the block authentication process and decide to add the block or discard the block.

Manage the node

Each Pychain node will also expose a HTTP endpoint to allow for end user interaction.


# get all blocks known to this peer
$ curl http://localhost:6001/blocks

Finally

Working code for Pychain can be found here. Pychain is a simplified implementation of Blockchain to help understand the concept, and it is not suited for anything other than that.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s