Three Stages of Blockchain Development

Exploration and discovery of blockchain application scenarios in the next 3 to 5 years can be roughly divided into three stages:

● Information "blockchainization";

● Value "blockchainization";

● Scenario "blockchainization".

23.1 Information "blockchainization", solving the problem of information fragmentation. Blockchain technology can maintain and verify a public transaction ledger. Generally speaking, the application of any technology usually starts from the nodes that can most improve performance and efficiency. People usually have their own transaction ledgers through Alipay or bank cards. Generally, transfers between Alipay accounts or within the same bank can be credited almost instantly, requiring only the verification of records by Alipay or a specific bank; however, interbank transfers or even cross-border payment transfers take a long time to confirm transactions, mainly because various banks and immigration registration authorities need to repeatedly verify and check, consuming a lot of manpower and resources. According to Fan Bin, General Manager of IBM Global Business Consulting Services for the Greater China banking sector, many banks are now preparing or have implemented the "blockchainization" of transaction data, which has greatly improved the efficiency of cross-border and interbank remittances and eliminated redundant work, generally existing in the form of consortium chains. Essentially, a blockchain-based public ledger can enable data interoperability and real-time synchronization among different countries and banking systems, simplifying account opening processes and providing convenient services for each user. For banks, it allows for a comprehensive understanding of each user's credit data, reducing business risks and providing precise credit services.

In addition to the banking system's bills and payment processes, blockchain technology can also significantly improve various social issues caused by information fragmentation in other real-life scenarios through information "blockchainization".

There are many foreseeable application scenarios for value "blockchainization", which generally need to possess the following characteristics:

● The products or services exchanged by both parties can be digitized;

● Services/products are standardized, and the evaluation system is clear and traceable;

● Services are provided by individuals, and consumption is also by individuals;

● As more individuals gradually join, the value of the network increases.

A simple example is the shared mobility industry, where the service providers are mainly drivers (car owners), and the consumers of the service are passengers. The driver takes the passenger from point A to point B, and the passenger pays the driver a certain fee, completing the transaction. The driver can calculate the driving distance using GPS navigation and record the driving time with a timestamp, automatically generating the fare according to a publicly disclosed calculation method. After the driver delivers the passenger to the destination and confirms it, the passenger's account automatically transfers the money to the driver's account. As more passengers and drivers join the blockchain network, it becomes easier for drivers to receive more suitable orders, and passengers can more easily find rides.

The main entrepreneurial opportunities in this stage lie in "disintermediation" application scenarios, such as blockchain-based Didi, Meituan, Taobao, etc. Since these centralized companies do not have significant technological barriers, for example, the order distribution algorithms of Didi and Meituan are not complex and are easily replicable, the challenges lie in user acquisition and solving the on-chain transaction throughput issues. Addressing these challenges relies on educating the public about blockchain knowledge and the development of technology. Other industries with similar characteristics, such as P2P insurance, credit, gambling, prediction, and gaming, can also achieve "disintermediation" of transactions and solve trust issues through value "blockchainization" in scenarios that meet the above four characteristics.

From the perspective of chain capacity and information characteristics, the information that goes on-chain and is verified by a large number of distributed nodes generally needs to have the following characteristics:

● The information has high value, such as Bitcoin transfers;

● Each piece of information is independent and does not interfere with each other;

The first point ensures the economic motivation for more nodes to participate in verification; if a massive amount of low-value data is stored in centralized servers, it is also feasible. The second point involves a certain degree of filtering of information, where packaging blocks and chaining together in independent information flows serves a good purpose. In situations where information is interdependent, the block and chain structure may not function effectively.

In the future, there may be a trust level for information, quantitatively assessed based on the randomness of distributed storage and verification nodes. It is foreseeable that in more future application scenarios, only a few pieces of information, such as those awaiting confirmation or transaction records, will require a large number or even all nodes in the network to verify. Various application scenarios, such as articles, images, videos, and music in apps, are merely projections of on-chain string mappings to real life.

The objects of asset on-chain include ownership, usage rights, and income rights of assets. Table 27-3 compares the priority distribution of on-chain ownership, usage rights, and income rights of different assets. It was found that:

(1) The content of the same asset on-chain varies, leading to differences in priority. For example, when the on-chain asset is usage rights (leasing) and future income rights, it becomes the highest priority asset (Category 1 asset). In contrast, if the on-chain asset is ownership, its demand for disintermediation is low, and its liquidity is low, making it a Category 11 asset.

(2) The priority of income rights on-chain is relatively high. Income rights have a high demand for disintermediation, are highly liquid, and are very suitable for on-chain assets. Therefore, attention should be given to the on-chain process of asset tokenization.

(3) The priority of ownership on-chain is relatively low. Except for traditional financial assets such as stocks, securities, and receivables, the ownership of other assets has a lower priority for on-chain compared to income rights and usage rights, mainly due to low asset liquidity and low demand for disintermediation.

(4) The second-hand trading and leasing markets are potential growth points for asset on-chain. Whether in the second-hand trading market or the leasing market, there is a high demand for disintermediation of transaction content, especially in the leasing market, where the high liquidity of asset usage rights places it at the top of the asset on-chain priority list.

Table 27-3 Asset On-Chain Priority Distribution

As a distributed ledger technology, blockchain can effectively meet the demands for transparency, decentralization, and privacy protection in the asset circulation process. With the rise of the "blockchain boom," some people blindly "praise" blockchain technology for promotional or speculative financing purposes, believing that all assets need to be on-chain. In reality, not all assets are suitable for on-chain. To determine whether an asset is suitable for on-chain, two aspects need to be clarified:

(1) Whether the on-chain content is the usage rights, ownership, or income rights of the asset. Different contents of the same asset on-chain will have varying priorities, with income rights having a relatively high priority and ownership having a relatively low priority.

(2) When assessing the priority of asset on-chain, it is necessary to consider factors from three levels: supply, operation, and demand. This article lists four quantifiable indicators—demand for disintermediation, asset value, asset liquidity, and operability—as standards for assessing the feasibility of "blockchain technology + asset management." Among them:

● The demand for disintermediation is the original driving force for asset on-chain. Not all assets need to be on-chain. Blockchain technology must address the "pain points" in the current asset management process. If the trust relationship between buyers and sellers can be guaranteed in the current asset transaction process, and assets can achieve rapid circulation, then these assets do not need to apply blockchain technology.

● Asset value and liquidity are the foundations for asset on-chain. First, asset on-chain requires certain asset value and liquidity. For low-value assets or assets sensitive to price, or assets with low transaction frequency or one-time transactions, the feasibility of applying blockchain technology is low.

● Operability is the catalyst for asset on-chain. Operability is not the decisive factor for asset on-chain, but it affects the speed of asset on-chain. In addition, this article also draws the following two conclusions:

(3) The most suitable assets for priority on-chain include: usage rights and income rights of land, housing, and buildings; usage rights and income rights of specialized equipment; usage rights and income rights of mechanical equipment; usage rights and income rights of precious metals and jewelry; income rights of coal, oil, and natural gas; ownership of virtual assets in social, community, and entertainment platforms; ownership of stocks and bonds; ownership of receivables; income rights of antiques and paintings; usage rights of patents and trademarks.

(4) The second-hand trading and leasing markets are potential growth points for asset on-chain. This is due to two reasons: first, the demand for disintermediation of usage rights in these two types of assets is high.

From the perspectives of target audience, token structure, consensus mechanism, issuance method, and distribution method, the current state of token development can be summarized as follows:

(1) In terms of quantity, payment currency, general platform, and industry application tokens are relatively balanced, presenting a "three-legged" situation.

(2) In terms of token structure, single-layer tokens are predominant, with multi-layer tokens developing in parallel. Among them, payment currency tokens are all single-layer tokens, while there are five types of second-layer tokens and one type of third-layer token.

(3) Consensus mechanisms and issuance methods are more flexible. In terms of consensus mechanisms, they have gradually evolved from the proof-of-work mechanism (PoW) initially adopted by Bitcoin to proof-of-stake (PoS) and Byzantine fault tolerance (BFT), then to delegated proof-of-stake (DPoS) and delegated Byzantine fault tolerance (DBFT), and finally to hybrid consensus mechanisms that combine various consensus mechanisms. In terms of initial issuance methods, depending on project characteristics, community management, and technological promotion needs, in addition to classic mining releases, other methods have been introduced to promote the early development of tokens, including pre-mining, ICO crowdfunding, venture capital, airdrops, and rewards.

(4) Incentive methods are diversified. Mining rewards and transaction fees for node rewards still play an important role in the current token economic system, especially for payment currency and general platform tokens. For application platform tokens, the incentive methods are generally personalized based on the content of the token project.

(5) Community governance is integrated into the construction of the token economic system. Especially for general platform and application tokens, community governance mechanisms play an important role in promoting the sustainable development of the token economy.

Secondly, this article summarizes the current development models of the token economy. The development paths of the token economy system represented by Bitcoin can be divided into three categories: first, addressing issues such as information concealment, transaction efficiency, and high energy consumption from a technical perspective; second, expanding application scenarios to implement the token economy in more projects; and third, reducing market disturbances, i.e., minimizing the impact of market factors (such as price fluctuations) on the blockchain community. Specifically, this involves evolving from a single-layer token structure to a multi-layer token structure, separating the value attributes and management attributes of tokens, so that the value fluctuations of tokens do not affect the normal operation of the blockchain network.

At the same time, the future development direction of the token economy system is discussed:

(1) For tokens designed from the perspective of technical improvement, the focus is on solving technical challenges in blockchain networks, emphasizing technological innovation and the universality of application scenarios. Therefore, such tokens often choose a single-layer token structure, and payment currency and underlying technology development platform tokens characterized by a single-layer token structure will continue to be the mainstream development of the token economy system.

(2) For tokens serving application scenarios, the aim is to pursue the application capabilities of blockchain technology, thus making personalized adjustments based on the content of the scenarios. These tokens may choose either a single-layer or multi-layer token structure.

(3) When the token economy system requires stability in the blockchain community and aims to reduce the impact of market speculation and price fluctuations, a multi-layer token structure is generally chosen. The multi-layer token structure includes two components: management tokens and value tokens. Token economy systems based on community management are suitable for multi-layer token structures. Currently, the application scenarios for multi-layer token structures are still relatively few, but their potential in future token economy system designs is enormous.

Finally, this article points out the "pain points" in the development and application process of the token economy:

(1) The development of blockchain technology is still immature, manifested in the inadequacy of underlying blockchain technology and the immaturity of blockchain operational management models.

(2) The implementation process of token projects is challenging. In terms of market capitalization, payment currency tokens account for 63%, general platform tokens account for 27%, while application tokens account for less than 10%. Except for content, entertainment, advertising, and IoT technology, non-financial tokens account for only 1%.

(3) The development environment of the token economy is chaotic, with high uncertainty in implementation. This is reflected in two aspects: 1) Token founders regard the token economy merely as a means of fundraising without considering the implementation issues of the token economy. 2) There are many uncertainties regarding whether the token economy can be implemented and how long it will take.

Decentralized Storage#

Another popular innovation is decentralized storage, which is an application innovation that utilizes distributed storage technology to store files in chunks across different storage nodes. Compared to centralized storage, it offers a higher level of privacy protection, lower storage costs, and more redundant data backup copies, effectively avoiding single points of failure.

In fact, the connection between decentralized storage and blockchain is not that close; the main role of blockchain in this context is as an incentive billing mechanism above the storage layer. "Miners" with idle hard drive space contribute their space and record their contributions in the blockchain through a specially designed storage proof mechanism, primarily focusing on dimensions such as contribution duration, space size, and effective space utilization rate. Contribution can be used to obtain equivalent token rewards, while users with storage needs must pay tokens to acquire more data storage space.

However, the development of decentralized storage currently faces several issues, mainly in three areas.

First, speculation on its token price has almost never ceased, making it nearly unbearable for users with genuine storage needs to endure cost uncertainties caused by price fluctuations.
Second, the space contributed by different miners is extremely dispersed. Although it can be set up worldwide, the overall storage performance of the decentralized network is not high due to local network environments and the mechanical quality of hard drives, making it far inferior to centralized storage. Therefore, it is only suitable for cold data and personal data storage.
Finally, with the influx of miners, the storage capacity supplied by most projects far exceeds actual demand, leading to a situation of oversupply. The source of data is a pressing issue that needs to be addressed; how to resolve the data source problem is currently a key focus; otherwise, there will be little room for further development.

Cross-Chain#

The two types of innovations mentioned earlier belong to application innovations, while cross-chain technology is considered a technological innovation. Different blockchain networks are independent and form isolated data islands. Cross-chain technology builds bridges between these islands, providing the possibility of data interoperability between different chains.

Initially, cross-chain involved direct connections between two chains, while the current stage of cross-chain resembles a hub, where interactions between chains are no longer direct but occur through a relay chain for information transfer. It can be said that the development of cross-chain technology is foundational for other innovations. For example, the following diagram illustrates a cross-chain network of the Polkadot cross-chain protocol, integrating other public chains into the same ecosystem through a relay chain.

Through the simple analysis of the two major categories of public chain technology application innovations above, we can easily see the role of investment incentives in promoting public chains. This is understandable; there is no free lunch. Without innovation, there would be no speculative hotspots, and ecological prosperity would be impossible. Compared to the previous stage of wild growth, purely speculative projects are gradually disappearing. DeFi is a derivative of traditional finance, while decentralized storage represents an exploration of the sharing economy, bringing it closer to people's daily lives.

Development of Consortium Chains. After understanding the recent innovations in the public chain sector, let's take a look at the development of consortium chains. Over the past few years in the consortium chain sector, I have witnessed the practical application of blockchain technology in corporate business activities, how traditional businesses have reduced costs and increased efficiency with the support of blockchain technology, and how it has lowered the cooperation threshold between enterprises. Of course, I have also encountered the existing development barriers of blockchain technology. Personally, I summarize the development of blockchain technology in enterprises into three stages.

Data Storage for Evidence#

Blockchain technology has characteristics such as time continuity, immutability, and traceability, making it very suitable for data storage that requires evidence retention. Currently, applications such as product traceability, internet courts, and electronic certificates belong to this stage, with most applications storing only data proofs on the blockchain, rather than original data.

This approach is related to both the data carrying capacity of blockchain itself and the demands of the business itself. From the network's perspective, the demand for storage and bandwidth in blockchain increases exponentially with the growth of data volume. Therefore, it is necessary to reasonably control the orientation of data. I will provide further interpretation of this point at the end of the technical section. The application in the data storage for evidence stage merely stores data on the blockchain, and the data is utilized for evidence only when necessary; most of the time, the data remains inactive. Blockchain technology serves as a special technical guarantee for these applications, rather than being indispensable. Moreover, most of the time, each entity operates independently without involving multiple parties. At this stage, the substitutability of blockchain is relatively strong. Currently, most enterprise blockchain applications are still at this stage.

Data Exchange#

Traditional inter-enterprise cooperation involves the exchange of commercial data. Generally, the data exchange model involves the data provider or demander offering API interfaces in a pull or push manner. However, once mutual data needs arise and more than two enterprises are involved, the problem becomes more complicated. Blockchain technology can conveniently solve this issue by deploying blockchain nodes within each enterprise, allowing each to interact only with its own operational nodes. The blockchain mechanism automatically synchronizes data to other participants, and if any node has new data on-chain, the blockchain's event notification mechanism automatically informs the internal applications of each enterprise. At this stage, enterprise cooperation heavily relies on blockchain as the hub for data exchange, thus the substitutability of blockchain is relatively low. Some enterprises have already reached the data exchange stage, such as the combination of federated learning and blockchain technology.

Value Transfer#

One of the significances of Ethereum is to provide a medium that anchors real value to the digital world, allowing us to see the possibility of transferring from an information network to a value network. In the field of enterprise blockchain, this significance can be infinitely amplified. The products that enterprises offer externally are merely goods/services/solutions, etc. If blockchain technology can convert enterprise products into circulating assets within the value network, various asset applications can be extended beyond the basic value.

At this stage, blockchain technology is the cornerstone of the value network, making it irreplaceable. Of course, we are still far from the value network at this stage, and I cannot accurately describe what the future will look like. The development of enterprise blockchain is relatively short, and overall, it is still in its infancy. However, the country has a very optimistic view of its future prospects, guiding the development of blockchain in various fields of national social economy from a high dimension.

Blockchain +, although still in the first stage of development, already shows further trends. Of course, we should also recognize that while the value of blockchain technology has been gradually validated over the past decade, it is still in a relatively short development period, and the technology itself is not mature. The transformative impact of the new thinking it brings is also profound. It will take more time for blockchain technology to fully integrate into our work and lives. If one day, every innovation in public chains can empower the real economy, and enterprises can dismantle the layers of commercial barriers they have built, leading to a re-intersection and integration of the public chain and consortium chain sectors, that will be the golden age of blockchain technology, making the value network within reach.

Does this mean that blockchain can transmit real value? The main points are: anchoring, confirmation of rights, and transactions (reconfirmation of rights).

Looking at today's internet giants like Huawei, Ant Group, Baidu, and Tencent, it is clear that they are deeply engaged in consortium chains. Currently, the decline in the heat of the cryptocurrency sector corresponds to the decline in public chain enthusiasm, while the rise of the chain sector essentially indicates the rise of consortium chains. At present, consortium chains occupy the main market, while public chains are heavily suppressed.

Digital currency trading platforms, such as Binance and Huobi, belong to which circle? They belong to the public chain sector.

"The conversion of enterprise products into circulating assets within the value network." This statement is somewhat unclear. Why do enterprises need to convert their products or services into the value network?

Envisioning possible scenarios for the value network, it can be understood that information circulating on the internet is subject to correctness and incorrectness, making it impossible to assign a value (measure value) to information. The value network aims to express valuable information; the products and services provided by enterprises are inherently valuable, thus they can transition into the value network. Furthermore, it may also generate additional value, but this remains uncertain and is merely a conjecture.

Many people may think that blockchain is a novel technology; however, it is not. It is merely old wine in a new bottle, as it does not create new technology but combines several already mature technologies, representing a form of integrative innovation. When we first begin to learn about blockchain, the most important thing is to grasp its technical characteristics and understand its technological foundation.

The Technical Foundation of Blockchain#

At the same time, it helps you grasp the three most important characteristics of blockchain technology.

As blockchain technology has developed, especially after Ethereum integrated the concept of smart contracts with blockchain technology, it has become a medium between reality and the network. It can be said that blockchain is a carrier of value, representing a new type of social production relationship.

How to understand this statement? My personal understanding is that based on blockchain technology, we can break down the barriers between the real world and the network world, virtualizing material and materializing value. In the future, the internet will not circulate information but rather living value.

The future is promising, but it must start with practical steps. Let’s return to the real world. Recently, a picture circulated widely online, humorously dubbed "the physical manifestation of blockchain technology." This example perfectly illustrates the technical characteristics of blockchain.

This is the entrance of a residential community in Shenyang, Liaoning, where homeowners have linked multiple locks together to form a simple access control system. Whoever has a car adds a lock; each lock is labeled, and community car owners only need to use their keys to open the corresponding lock to access the gate. This prevents outside vehicles from occupying community parking spaces; it must be said that ingenuity lies among the people.

How does this illustrate the characteristics of blockchain technology? First, we need to clarify what characteristics blockchain possesses. Generally, we recognize three points:

Decentralization
Traceability
Immutability

Decentralization: In this community access control system, each lock represents a homeowner. They do not need a property management company to manage them uniformly; they only need to maintain their own locks to ensure the system operates normally. Each lock varies in size and cost, and homeowners may have multiple cars, but in this system, there are no status differences. Moreover, when new homeowners join or existing homeowners move out, they simply add or remove the corresponding locks.

Have you noticed that this contains the concept of decentralization? Access control is no longer managed by a "third-party organization" like a property management company; each lock is part of the management. Similarly, the original intention of blockchain is to eliminate centralized third-party organizations (refer back to Lecture 1). The data and state of the entire network are jointly maintained by all nodes in the network, with no status differences among them, only differences in available resources. Moreover, the offline status of any node does not affect the operation of the system. You can try to deduce what kind of network model and storage model should be chosen to achieve decentralization. A system without a central node can be said to have every node as a center, each capable of providing services externally and also requesting services from other nodes, which is characteristic of a peer-to-peer network model. Each node acts as both a producer and consumer of data.

On the other hand, because the roles of nodes are equal, the data stored by each node should be consistent, independently maintaining a complete blockchain structure. Even if some nodes lose data, as long as one node remains intact, historical data will not be lost. This effectively avoids system crashes caused by single points of failure, and compared to traditional data disaster recovery models, its reliability can be said to be foolproof. Of course, decentralization is only an ideal state. At this stage, blockchain decentralization is essentially relative decentralization, which we can also call multi-centralization. When understanding concepts, we need not only rational thinking but also to learn to accept the gradual changes in the intermediate process through emotional thinking.

Traceability: In the access control system, each lock records relevant information about the homeowner, binding it to each homeowner. This allows for accountability in exceptional situations, such as when a homeowner forgets to lock the door, allowing outside vehicles to enter the community, reflecting the traceability of the system.

In the actual operation of blockchain, the principle of completing information traceability is similar but more complex. Let me explain.

From the perspective of a single node, blockchain can be viewed as a time-sequenced database. Each operation on the system essentially stores corresponding data and logs in each node's database. Moreover, each piece of data is not stored discretely but is sequentially linked together in chronological order, with new data always derived from previously existing data.

Following this association, if you want to trace the historical changes of the data state, it is very easy; you just need to look back sequentially, and you will definitely find the initial state of the data. All of this relies on blockchain storage technology, which primarily focuses on data structures and data relationships, including transaction and block data structures, the relationship between transactions and blocks, and the storage patterns of block states, etc.

Immutability: Another technical characteristic is immutability, which is a concept that beginners often confuse. The so-called tampering refers to unauthorized modifications that are not recognized, rather than an inability to modify.

Looking back at the example, we can see that the keys held by homeowners correspond one-to-one with the locks. Without a key or using the wrong key, vehicles cannot enter the community. If a social vehicle wants to enter the community, it may attempt to impersonate a homeowner, duplicate a key, or add a lock, which constitutes tampering. However, once discovered by the homeowners' committee, the error will be promptly corrected, and the social vehicle will be removed, thus achieving immutability.

In blockchain, achieving immutability requires two technologies for assurance. In the previous example, the function of the "lock" is achieved through cryptographic technology, while the role of the "homeowners' committee" is played by consensus algorithms. As mentioned earlier, the data in a single blockchain node is stored in chronological order, and the key to this chaining utilizes cryptographic hash algorithms. Hash algorithms can transform a piece of data into a fixed-length data fingerprint, and even slight changes in the data will result in a significantly different fingerprint. Through this method, we can integrate the data fingerprint from the previous time period with the data from the subsequent time period. This process continues, ensuring that the data in the later time period will always include the data fingerprint from the earlier time period, thus forming a chain of information linked by data fingerprints.

If a malicious actor attempts to modify the data from a certain time period, according to the principles of hash algorithms, the corresponding data fingerprint will change. Therefore, they must modify the data of every subsequent time period; otherwise, the data chain will break at the moment they make the modification, losing its traceability.

We can further amplify the difficulty. Consider how a malicious actor would respond if they truly modified all the data on their local node. At this point, the consensus algorithm must step in to ensure that the data of the entire system remains unaltered. Consensus, in a distributed system, refers to maintaining data consistency. If data inconsistency occurs, most consensus algorithms follow the principle of the majority ruling the minority.

The data of each node in the blockchain network is consistent. A malicious actor only alters the data of their single maintained node, while from the perspective of the entire network, the majority of nodes' data remains the correct data. Finally, I want to remind you that when analyzing problems, one must not look at only one side. Immutability is actually a dialectical characteristic. The rule of the minority obeying the majority means that if a malicious actor can control the majority of node resources, then tampering with the blockchain is possible. In a robust and sufficiently decentralized network, the cost of tampering is enormous, making it nearly impossible to succeed. Once successful, the remaining minority would be the malicious actors attempting to disrupt the consensus of the blockchain, as seen in Ethereum's hard forks.

Blockchain is an important driving force for the new generation of information technology. It utilizes the integration of foundational technologies such as storage, cryptography, peer-to-peer networks, and consensus algorithms to provide characteristics of decentralization, traceability, and immutability, which can be used to solve trust and security issues on the internet, thus promoting the transformation of the internet from information transmission to value transmission.

You may find the previous interpretation somewhat formal. To help you understand, let me share a more accessible version: one can say that storage is the bricks, cryptography is the steel bars, peer-to-peer networks are the concrete, and consensus algorithms are the blueprints. Based on these, when combined, they construct the intricate superstructure of blockchain. I want to emphasize that blockchain technology is not dogmatically reciting the textbook. Bitcoin is blockchain, and Ethereum is also blockchain; the technology itself does not dictate a single path for implementation.

Blockchain is a distributed storage technology with a high difficulty of tampering, currently mainly applied in digital currencies like Bitcoin.

Blockchain is a decentralized, traceable, and immutable information storage technology, a fusion innovation combining various technologies, primarily solving trust and security issues on the internet.

Reading "Blockchain Revolution: How the Technology Behind Bitcoin Is Changing Money, Business, and the World" is recommended; if you are not very familiar with the technology, you can read "Blockchain: Blueprint for a New Economy."

Git is a form of blockchain in disguise.

Decentralization: Each user has their own git repository locally and can pull and push between each other.
Traceability: Each git commit relies on the previous commit, allowing for tracing back to the initial commit.
Immutability: Modifying a local historical commit will change the hash values of subsequent commits.

From an intuitive perspective, I have discussed the characteristics of blockchain technology using the example of "iron chains," and I have also introduced the foundational aspects of blockchain technology. Starting today, I will take a few lectures to explain the core applications of each technology in blockchain, providing a comprehensive overview of the blockchain technology system. In this lecture, I will take you deep into a single blockchain node to help you understand how blockchain storage is designed.

When we talk about storage design, we first think about how data is stored in the blockchain and which database to use, among other conventional topics. However, in my view, these only scratch the surface of storage design. To truly grasp the key points of blockchain storage, we need to understand three foundational concepts: transactions, blocks, and states. With these foundations, analyzing blockchain storage design will become second nature.

The first concept to understand is the transaction (Transaction), which is the smallest and most core knowledge point in blockchain. Since we usually start learning about blockchain through Bitcoin, we often understand transactions as transfers. However, this understanding is somewhat one-dimensional. In fact, the concept of transactions in blockchain has already been expanded.

From the perspective of behavior, a transaction is equivalent to an operation (Operation). When we submit a transaction to the blockchain network, we are essentially initiating an operation, and the specific content of the operation is related to the specific blockchain protocol. For example, in Ethereum, an operation might involve executing a method within a smart contract. If we analyze from the perspective of computer technology, a transaction is essentially an atomic entity; they are just translated differently, with both being referred to as Transaction in English. A transaction is the smallest component of data within the blockchain network. Once a transaction is submitted, it can only have two states: either successful or failed; there cannot be a halfway successful state.

Although different blockchains have consistent definitions of transactions, the attribute fields may vary, but this does not prevent us from abstracting a general transaction attribute template. It is important to note that not all blockchains adhere to the rules depicted in the following diagram; this is merely for your understanding of the main attributes of transactions.

We can see that a transaction typically has eight attributes (the transaction hash itself is also an attribute). The From and To fields point to the initiator and recipient of the transaction, which is easy to understand. For example, a currency transfer naturally requires both a sender and a receiver. There are three attributes related to smart contracts: the smart contract identifies the name of the smart contract to be executed for this transaction, followed by the method corresponding to the execution of the smart contract and the parameter list that should accompany the execution of that method. Different methods may have parameters of varying lengths and data types, which are collectively referred to as the parameter list here. The next timestamp field indicates the time when the transaction was constructed on the client side. This time is independently added by the client, but we need not worry about discrepancies with standard time, as the blockchain network will verify the transaction time upon receipt; transactions that are too early or too late will not be accepted by the network, which limits the potential for fraud to some extent. The final common transaction field is the signature, which is generally issued by the account in the From field to prove to the network that this transaction was indeed constructed by this account and not forged by someone else. This is primarily done by signing the transaction with the private key held by the account owner, which only the account owner possesses. It is akin to the seals we use in everyday life, except that the likelihood of the private key being forged is almost nonexistent unless it is stolen.

One point to note is that all transactions in the blockchain are essentially initiated from outside the blockchain network; the blockchain network only receives transactions and does not produce them, nor does it make any modifications to transactions. In other words, once a transaction is constructed on the client side, it becomes fixed.

Therefore, we can use the hash value of the transaction content as the identifier for the transaction within the blockchain network, and this identifier is not part of the transaction's field content. How to understand this? You can think of it this way: an ID card can represent you as a person, but the ID card is not part of you.

Additionally, you may have this question: if each transaction is independently constructed by the client without negotiation with other participants in the network, won't this transaction hash be duplicated? Here, we need to utilize the properties of hash algorithms. Hash duplication means a hash collision has occurred. In the subsequent cryptography chapters, we will discuss the probability of hash collisions and how it relates to the hash algorithm used, which is nearly impossible to collide.

Block: Having understood transactions, we can now discuss what "container" is used to store this transaction data. In fact, this container is the block. You can understand the relationship between transactions and blocks as follows: transactions are akin to goods, while blocks are containers that can hold multiple transactions.

In the previous discussion on traceability, we mentioned that blockchain is a sequential integration of data over time, and each time period's data is referred to as a block (Block). A block refers to a data structure formed by packaging all (valid) transactions received by a node within a certain time frame. The term "valid" is used because some blockchain designs also include invalid transactions. Directly understanding the concept may seem abstract, so we can refer to the block diagram to understand the data structure of a block.

The design of blocks may seem complex, but don't worry; we only need to clarify three key points to appreciate the essence of blocks.

Block Structure: The first point we need to clarify is the structure of the block. From the diagram, we can see that a block is divided into a block header and a block body. The block header contains the basic attributes of the block, with four important attributes: the previous block hash for linking blocks, the transaction root hash for linking blocks with transactions, the block height for marking the current block's position in the blockchain for easy location, and the timestamp for recording when the block was packaged. The block body contains only transactions, which are ordered chronologically, generally sorted by the timestamp field of the transactions.

Inter-Block Association#

The second point to focus on is the relationship between blocks. We have mentioned multiple times that each block will include the previous block hash as the anchor point logically linking the two blocks. The block hash, similar to the transaction hash, is an external attribute of the block and can only be obtained after the block is constructed. If we trace back from the current block step by step, we will eventually find the genesis block, which also has a previous block hash, but it is an empty value. For example, you can check the previous hash of the genesis block through the Ethereum browser.

Block and Transaction: The final point is the relationship between blocks and transactions. While we previously used a metaphor to liken blocks to containers and transactions to goods, we may not fully understand the underlying principles. Conceptually, this is relatively complex, primarily because it introduces an uncommon data structure: the Merkle tree. Let's first understand it.

A Merkle tree is a tree structure that generally has at least three layers: leaf nodes, intermediate nodes, and root nodes. The number of layers of intermediate nodes depends on the number of leaf nodes; the more leaf nodes there are, the deeper the Merkle tree will be.

Its construction logic is as follows: adjacent leaf nodes undergo hash operations, and the resulting hash values serve as the parent nodes for these two leaf nodes. The same logic is applied sequentially upwards, ultimately resulting in the second-to-last layer having only two remaining intermediate nodes, which undergo a hash operation to produce their parent node, which is the root node of the entire tree. Thus, the Merkle tree composed of hash values is constructed.

Referring back to the previous block diagram, we can see that the transaction hashes contained in the block body correspond to the leaf nodes of the Merkle tree. The hashes are then calculated upwards, ultimately yielding the root hash, which represents the transaction root hash of all transactions in the block body. This data will be recorded in the block header. At this point, you may have a question: why go through the trouble of introducing a Merkle tree? Why not simply mix all transactions together and take one hash? We know that the result of hashing data can be used as a data fingerprint, which means that hashing can serve as a data verification mechanism.

If a transaction in the block is tampered with by a malicious actor, and we designed the transaction root hash by simply taking one hash of all transactions, it would be challenging to identify the tampered transaction when the data verification fails, especially when there are many transactions.

However, if we use a Merkle tree, any changes to the leaf node hashes will propagate to their parent nodes, layer by layer, up to the root node. This means that the root node's value actually contains the hashes of all leaf nodes, but it allows potentially tampered transactions to be handled separately. Thus, if an issue arises, we can easily identify the erroneous branch. This enhances the flexibility of data verification and reduces unnecessary resource waste. Through the above analysis of the three key points of block logic, we have clarified the design context of blocks. The reason blockchain is called a blockchain is, from a literal perspective, due to the special data structure of blocks.

State#

After discussing transactions and blocks, we now need to understand a frequently overlooked concept: state (State). You may have never heard of it before, but its role is significant.

Every transaction executed in blockchain has an output, and the state is the accumulation of outputs after the transaction execution. How to understand this? Let’s use a simple example: 2 + 3 + (4 * 7) + (8 - 9 / 3) + 23 = 61. Each addition expression on the left side of the "=" can be considered a transaction record, while the 61 on the right side is the accumulated state after executing the transactions.

Finite State Machine#

We can observe that even if the result of the expression is lost, as long as we remember the expression, we can recalculate the corresponding value. This illustrates the concept of a finite state machine, which states that in a closed system, if the initial conditions of the state are consistent and the order of conditions for changing the state is consistent, a consistent result will ultimately be obtained. Blockchain records all transactions in chronological order, so even if the state is lost, we can easily replay the state by executing the transactions in order again. Therefore, from a certain perspective, blockchain can also be seen as a finite state machine.

Since we can replay the state, why should we retain the state? Let’s consider a hypothetical scenario: if you are currently executing a transaction that requires a certain input, and this input is related to the output of a transaction in a previous block.

If the state is not retained, you would need to re-execute the associated transactions before executing this transaction, and that transaction may be related to an even earlier transaction, necessitating continuous backtracking until the source is found. Thus, theoretically, it is possible not to retain the state, but this would require bearing the corresponding consequences of such a design.

Similarities and Differences with Databases: If you still find the concept of state difficult to understand, we can also explain it by comparing it to databases. When we interact with a database through CRUD operations, the records inserted, updated, or deleted in the database represent the state, while each statement you execute is a transaction. In other words, if you export all SQL statements from creating a database to creating tables, inserting data, updating data, and deleting data, you can replicate an identical database elsewhere.

Both blockchain and databases preserve historical operation records and state data sets. However, databases focus more on state, while blockchain primarily records historical blocks, with state as a secondary focus. One lives in the present, while the other reminisces about the past. The logic of the two is not fundamentally different; they merely emphasize different aspects.

State Models: Understanding the design concept of state raises the question of how state is represented in blockchain. Depending on the positioning of the blockchain, we can roughly categorize the design of state models into three types.

One is the UTXO model (Unspent Transaction Outputs) used by Bitcoin and focused on digital currency. In this model, each transaction should have N transaction inputs and produce M transaction outputs (N and M can be unequal). The transaction inputs are the unspent transaction outputs of any preceding transactions. If the current transaction is successful, the outputs of the preceding transaction become the inputs of the successful transaction, thus losing their qualification to become transaction inputs again. The UTXO model can track the flow of digital currency: unspent transaction inputs indicate where the currency comes from, while unspent transaction outputs indicate where the currency is going.

Another is the account model adopted by the Ethereum blockchain, which represents changes in account balances through addition and subtraction. Each transaction execution achieves dynamic balance between different accounts. For example, if you transfer 1 unit of currency, your account balance decreases by 1, while mine increases by 1. This model aligns more closely with our daily understanding. Additionally, the account model supports the storage of custom data beyond just balance, allowing for the derivation of smart contract data storage.

The final category is a general model that further builds on the account model, lacking built-in state attributes and allowing for the storage of any custom data. It is widely used in consortium chains. The positioning of consortium chains is to support enterprise-level applications, and the types and models of enterprise businesses are unpredictable, making it impossible to preset a state model in the design. Since it is difficult to satisfy all voices, the design of the state is left to enterprise application developers, allowing for custom states while the chain itself only provides a general data interface.

From the design and application scenarios of the three state models, there is no single solution for selecting a state model; any model design that meets the application scenario is a good model.

Summary#

In this lecture, we primarily delved into a single blockchain node, focusing on understanding the key points of blockchain storage, mainly discussing the data structures of transactions and blocks, as well as the blockchain state model. I did not emphasize specific design solutions because no matter how novel or innovative a blockchain platform is, its foundational design cannot escape these three points.

The design of blockchain storage is not fixed; as long as you truly understand transactions, blocks, and states, you can become the next blockchain storage architect.

For developers, state is more important. The block is like a framework, while the state is the data structure and algorithm needed for specific business designs.