Since in the introduction of Bitcoin and the explosion of crypto currencies has created quite a lot of attention around the concept of Blockchain. In this post I will try to explain the concept by implementing Pychain, a toy Blockchain, in about 200 lines of Python.
What is Blockchain?
Wikipedia defines Blockchain as
A blockchain, originally block chain, is a growing list of records, called blocks, which are linked using cryptography.
If we simplify this a little further for easier understanding, we can define Blockchain as a Linked List of Blocks. In a traditional Linked List each node points to its succeeding node. However in Blockchain, each Block points to its preceding Block.
Just another Block in the chain
Continuing with our Linked list analogy, each block each block must contain meta data to validate the integrity of the chain. In Pychain we include on the most necessary: index, nonce, timestamp, data, previous hash, and hash.
Most of the attributes are self explanatory, except of nonce. We will discuss about a little bit later.
Each block being added into the chain is validated by calculating the hash. SHA-256 in Pychain. The way hash is calculated is take all the fields (index, nonce, timestamp, data, and previous hash) and calculate SHA-256 sum.
The process of calculating the hash is defined as “Mining” in bitcoin and other crypto currencies. If it were this simple, then validity of the chain can compromised very easily. To make it difficult to compromise each miner has to spend some time and money “working”. This is called “Proof of Work”.
Proof of Work
SHA-256 is always 256 bits or 64 bytes long no matter how long the input is. For example hash for text, “Hello, Pychain”, it is 368fd26488e3fda58ec81fc5bdc0ea6b919526f765cfddc315fb1d81b9793649.
$ echo Hello, Pychain | sha256sum
Computing this hash hardly takes any effort. To increase the effort needed to mine a block, Bitcoin arbitrarily choose certain number of digits in the hash to be zeros. At the time of writing this post Bitcoin expects the first 18 digits of the hash to be zeros. For a given input, SHA-256 will always compute the same hash. This is where the field “nonce” defined in our block comes to picture.
For our Blockchain, we will set the default “difficulty” to 4, but while running this value can be set to whatever we want.
Adding a new Block
Now that we have some basic infrastructure in place we can start adding blocks to Pychain. To mine a new block we need a information about the previous block and we compute the rest of the fields, index, time stamp. Data is provided by the end user.
If we need the previous block to add a new block, how do we kick start the chain? This chicken and egg problem is solved by what is called the “Genesis Block”. This is a special kind of block with previous hash set to all zeros. Pychain generates this block like this.
In a real world Blockchain, genesis block is computed only once.
Storing the Blocks
As the number of blocks grow, we need a way to persist the historical block. Pychain does not have persistence, we will store them as a in-memory array.
Authenticity of a Block
So far Pychain has been single node Blockchain. Although it is a valid chain, it does not serve much of a purpose. When we have multiple peer nodes in the network we need to way to validate if the new Block being added to the chain is authentic. Each node upon receiving a new Block will try to validate the block. Only valid Blocks will be added to the chain.
Longest chain wins
Real world Blockchain implementations will trust the chain with the maximum work done, or in other words the longest chain always wins. This way as the chain grows, it becomes harder and harder to manipulate the chain. For an attacker to take over the blockchain has to mine all the blocks since the “fraudulent” block and be faster than 51% of all the nodes in doing so. For the sake of simplicity, Pychain only verifies if the index of the new Block is greater than the previous known block and the new block passes the validation.
Peer to Peer Protocol
All Blockchain implementations have a protocol to communicate between peers. Pychain has a bare bones protocol to achieve this. Websockets are used to communicate with other peers in the network. Pychain supports only two “commands”, QUERY_ALL and NEW_BLOCK.
Whenever a new node joins the network, it will try to ask one of its peers for the current state of the Blockchain using the QUERY_ALL command. If a new block is mined by this node it will send out a broadcast message to all the peers it is aware of with a NEW_BLOCK command. When a node receives a NEW_BLOCK message it will follow the block authentication process and decide to add the block or discard the block.
Manage the node
Each Pychain node will also expose a HTTP endpoint to allow for end user interaction.
# get all blocks known to this peer
$ curl http://localhost:6001/blocks
Working code for Pychain can be found here. Pychain is a simplified implementation of Blockchain to help understand the concept, and it is not suited for anything other than that.