Summary of the basic fields and development technologies of distributed applications

Summary of the basic fields and development technologies of distributed applications

Overview of Distributed System Technology

Now Internet applications, especially those of large Internet companies, have developed into large-scale or ultra-large-scale distributed and clustered applications. And small and medium-scale distributed applications have also appeared widely in various fields. In the future, as cloud computing penetrates into all aspects of social life, distributed applications will become more popular. Therefore, anyone who wants to engage in server-side application development has a basic understanding of distributed applications.

This article will briefly introduce the relevant technologies in each basic field of distributed applications. These technologies will have more or less design in a distributed application. Even if they are not involved for the time being, designers must also consider them to ensure that the system has room for further development.

1. Cluster Management

Keywords: Apache Zookeeper, Paxos algorithm, Etcd, Raft, Apache Curator

In a distributed system, there are some data closely related to the operation of the system and important businesses, such as node-related data, application services, and data service-related data. These data are vital to the normal operation of the cluster.

  • Server node related data: the address and status of the server

  • Service-related data: service IP, port, version, protocol, status, active and standby node information

  • Database related data: routing rules, sub-database sub-table rules

There are multiple copies of these important data in the distributed system to ensure high availability. But this raises another problem, that is, how to ensure the consistency of these data. Because these data are so important, inconsistent data can cause serious or even fatal errors. In a small-scale distributed system, because one or two servers can be used for cluster management, data consistency is easy to achieve. But for a large-scale distributed system, one or two cluster configuration management servers cannot support the large number of concurrent read and write operations brought about by the entire cluster, so it is necessary to use a few, a dozen or even more servers to support these requests. . At this point, a solution is needed to maintain the consistency of the cluster configuration data in these servers.

Among these many schemes, Paxos algorithm is one of the best schemes. The content of Paxos algorithm is not detailed here. The simple description is that each node in the cluster communicates with each other in a proposed way (modification of a piece of data). The proposal contains an increasing ID number. The node will always agree to the proposal with the largest ID number and reject other proposals. When more than half of the nodes agree to a proposal, the proposal is accepted and adopted by the entire node.

1.1. Apache Zookeeper

The language expression of Paxos algorithm does not seem difficult, but there are many technical difficulties. Fortunately, there are many solutions, the most famous of which is Apache Zookeeper. Zookeeper can be used not only to store configuration data, but also to implement cluster master elections, distributed locks and other scenarios. Apache Curator is the client of Zookeeper, which can simplify the use of Zookeeper and realize various scenarios.

Zookeeper is a distributed service management framework. Typical application scenarios of Zookeeper include configuration file management, cluster management, distributed locks, leader election, queue management, etc. Zookeeper can work in cluster mode, zoo.cfg records the addresses of all Zookeeper servers in the cluster, and each server has its own unique ID. At the same time, each server also has a myid file in its own dataDir directory to mark its own ID. In Zookeeper, data is stored in a tree structure, similar to an LDAP database.

Now projects like Zookeeper also have Etcd implemented in go language.

1.2. Reference:

  • Paxos algorithm

  • The number of zookeeper nodes and the performance test of watch

  • etcd: a comprehensive interpretation from application scenarios to implementation principles

  • etcd v2.1 benchmarks

  • Distributed configuration service etcd VS distributed coordination service zookeeper

2. Remote call

Keywords: NIO, Netty, epoll, Thrift, Protobuf

In a distributed system, calls between modules usually need to be implemented by remote calls. And with the popularity of the microservice architecture pattern, the proportion of remote calls will be higher and higher. In fact, this method of remote invocation has appeared a long time ago. Early technologies such as COBRA, EJB, SOAP, etc., but these technologies have disadvantages such as complex usage and poor performance. These shortcomings limit the popularity of remote calls. In recent years, with the development and progress of asynchronous IO technology and serialization technology, and the emergence and popularization of cluster management services like Zookeeper, the technical barriers that hinder the popularization of remote calls have gradually been broken.

Using HTTP + JSON can also achieve remote calls between modules, but this method is usually used to implement Public API. In the system, remote calls require faster speed, smaller delay, and asynchronous calls, so HTTP + JSON usually cannot meet such requirements. There are two important technical points for remote calls, one is IO technology and the other is serialization technology. In addition, remote calls also lead to two other problems: 1. Service registration, discovery, and routing issues. This problem needs to be solved in conjunction with Zookeeper services, for example; 2. How to simplify the use of remote calls to make it as simple as local calls. This problem requires a combination of technologies such as AOP. The specific resolution of these two problems is beyond the scope of this section.

2.1. IO

(I only talk about Socket IO here) Common IO models include blocking IO, non-blocking IO and asynchronous IO. Blocking IO means that if a thread wants to perform a certain IO operation (read or write data) on the Socket connection, when no operation is not executable (no data can be read or cannot be written), the thread performing the operation will be Suspend, the operation will be blocked until the operation can be executed. The advantage of this method is that the business code is very simple to write, but the disadvantage is that the resource utilization is not high. Because a connection must have a thread to handle. When there are a large number of connections, a large number of threads will be consumed. This shortcoming is very serious in the field of server-side development.

Non-blocking IO realizes the multiplexing of threads, one thread is used to handle multiple connections; asynchronous IO is the operating system to implement IO read and write operations. After the data is ready, notify the business thread for processing.

The above is just a general introduction to blocking IO and non-blocking IO. From a specific technical point of view, Linux provides support for non-blocking IO through epoll technology. Epoll is a system call of the Linux kernel, which was first added in version 2.5.44. epoll means event poll. Simply put, when an IO event occurs, the Linux kernel will notify the user. The usage method is that after creating the epoll handle, the user continuously loops on it to obtain new events (when an event occurs). These events come from multiple connections, thus realizing thread multiplexing.

In Java 1.4, NIO support (java.nio.*) was also introduced. In the Java NIO API, the user's program can register a connection (SelectableChannel.register(Selector sel, int ops)) to a Selector (a Selector can have multiple connection registrations). After registration, the user's program can obtain and process the events on this connection by continuously calling the Selector.selectedKeys() method in a loop (usually another thread is used to process the events, that is, the Reactor model)

Although Java provides good API support for NIO development (AIO is also supported since 1.7), IO development still has a high degree of complexity, and the Java NIO class library is part of the JDK with more bugs. Therefore, it is not recommended for ordinary developers to directly develop network IO functions based on JDK, but it is recommended to use Netty for development. There is no introduction about Netty here.

2.2. Serialization technology

Serialization technology is an important part of the remote call communication protocol, which defines how the data structure in the programming language and the data structure in the data transmission protocol are transformed into each other. The performance of serialization technology will affect the performance of remote calls in terms of serialization. The performance of serialization technology mainly includes two meanings: one is the resources (CPU, memory, time required) occupied during serialization; the other is the size of the data after serialization. SOAP WebService and REST WebService usually serialize data into XML format or JSON format. Because these two formats are both text formats, they have good readability, but they are relatively large for remote calls that need to be used frequently. So there is a serialization solution with better performance, well-known by everyone are Protocol Buffers and Apache Arvo. In addition, the serialization performance of Apache Thrift is also very good, but Thrift cannot be used as a separate serialization technology, but a complete remote call solution. The serialization part is not easy to be stripped out, and no complete API is open to use. Here are the performance comparisons of common serialization technologies

2.3. Apache Thrift

Thrift is contributed by Facebook. It is a high-performance, cross-language RPC service framework suitable for implementing RPC calls for internal services. Thrift uses the IDL interface description language to define and generate service interfaces, and then combines the server and client call stacks it provides to implement RPC functions.

service Calculator extends shared.SharedService {    /**    * A method definition looks like C code. It has a return type, arguments,    * and optionally a list of exceptions that it may throw. Note that argument    * lists and exception lists are specified using the exact same syntax as    * field lists in struct or exception definitions.    */     void ping(),     i32 add(1:i32 num1, 2:i32 num2),     i32 calculate(1:i32 logid, 2:Work w) throws (1:InvalidOperation ouch),     /**     * This method has a oneway modifier. That means the client only makes     * a request and does not listen for any response at all. Oneway methods     * must be void.     */    oneway void zip() } 

Thrift provides tools to generate corresponding interfaces and data structures for various programming languages (C++, Java, Python, PHP, Ruby, etc.) based on IDL files. Thrift not only provides traditional Request/Response interface calls, but also a one-way calling method (modified with the keyword oneway).

The serialization part of Thrift is closely integrated with the entire framework, and does not directly provide serialization and deserialization interfaces, so it is not easy to use with other transport protocols.

Examples and explanations

Here is an example of simple usage of Thrift. Except for the three classes ThriftClient, ThriftServer and CalculatorHandler, the remaining classes are all generated from the *.thrift file, that is, the IDL file of Thrift. Thrift IDL supports namespace (ie package space), inheritance and other syntax.

Taking Java as an example, the service in Thrift IDL will generate interface, server-side stack, and client-side stack. These three parts have two types: synchronous and asynchronous. That is, an IDL file will generate 6 internal classes. Through this call stack, the client can call the server after configuring the transmission protocol, address information, and serialization protocol.

The server-side implementation is also not complicated. Of course, developers need to implement the corresponding business class, which must implement at least one interface generated by IDL, synchronous interface or asynchronous interface, or both.

Development based on IDL is a way to use an RPC framework such as Thrift. This method is very suitable for newly developed services that need to be accessed remotely and have multiple-language clients. But for existing business methods, if you want to make them remotely accessible, then this method is inconvenient. So Facebook has provided another project-Swift (not Apple's Swift). This project can turn ordinary Java method calls into Thrift remote calls by adding Annotation to the Java code. This approach is similar to the functions provided by JAX-RS or many other REST frameworks. This method is very suitable for projects that mainly use Java or some other JVM languages, such as Scala and Groovy. While using Thrift's remote call, it also reduces the increase in complexity and the decrease in readability caused by the introduction of IDL. Thrift Swift example

2.4. Introduction of other technologies

Protocol Buffers

Protobuf is a high-performance serialization solution with comprehensive documentation and can be used with protocols such as HTTP.

Apache Avro

Apache Avro is a sub-project of Apache Hadoop. It provides two serialization formats: JSON and binary format. The JSON format has good readability, while the binary format is comparable to Protobuf in performance.

2.5. Reference

  • Serialization and deserialization

  • Netty series of Netty high performance

  • Netty thread model of Netty series

  • Netty Concurrent Programming Analysis of Netty Series

Message middleware

Keywords: Kafka, RabbitMQ In distributed systems, the importance of message middleware is becoming more and more obvious. Message middleware can decouple modules, provide asynchronous call functions, message persistence, and message peak clipping. The existing ones such as Apache ActiveMQ cannot meet the new needs, so new types of messaging middleware products such as RabbitMQ and Apache Kafka have emerged.

Apache Kafka

Apache Kafka takes full advantage of the fast sequential read and write speed of mechanical disks. After receiving messages, they are written to the disk synchronously to ensure data reliability while also ensuring very fast speed. Each Kafka cluster has multiple topics. Topic is equivalent to a category, and consumers can subscribe to one or more topics. Each Topic is composed of multiple Partitions. Messages are added to the Partition sequentially. Each message has a unique and ordered ID, which is called Offset. Consumers need to maintain the position (Offset) of the messages they consume.

Apache Kafka is different from traditional messaging middleware in that it uses a "pull" message model instead of the traditional "push" message model. That is, the client needs to actively obtain messages from the message middleware. The advantage is that the client can better control the amount of requests.

Queue mode and Topic mode

Traditional message queuing services have two modes: queue mode and publish-subscribe mode. In the former, a message will only be consumed by one consumer; in the latter, a message will be published to all consumers who subscribe to this topic. In Kafka, these two modes are implemented in one way-consumer groups. Different consumers in the same consumer group will not receive the same message. If you want to implement a publish-and-subscribe model, consumers must be in different consumer groups.

Kafka cluster


RabbitMQ is an implementation of AMQP (Advanced Message Queue Protocol) developed using Erlang. RabbitMQ is now developed by SpringSource under VMware. AMQP is a language-independent message queuing protocol. In RabbitMQ, there are three concepts: Exchange, Queue, and Route key. Exchange is used to mark producers, Queue is used to mark consumers, and Route key is used to associate the two. This approach in RabbitMQ provides a more flexible application model.

Distributed file system

Block storage and object storage

Block storage is to provide a bare disk to customers, but the bare disk may come from a physical hard disk, it may be multiple, or it may come from hard disks on different servers. Object storage provides more advanced interfaces through which files and related metadata can be read and written. The metadata contains the storage information of each block of the file. With file metadata, files can be manipulated in parallel.

High availability of distributed file system

To ensure data security, distributed file systems usually copy files into three copies. These three pieces of data will be located on different servers, corresponding to more demanding systems, such as public cloud storage. One piece of data will be placed in another computer room to ensure that even if the entire computer room fails, the entire file system is still available.


Ceph is currently a component of OpenStack, providing block storage and object storage functions.


GridFS is part of MongoDB. Used to store files that exceed the BSON size limit (16MB). GridFS divides files into individual parts and stores them separately.


FastDFS is a lightweight distributed file system suitable for storing small and medium files (object storage). FastDFS's tracking server is not responsible for recording the meta-information of the file. Information such as the specific storage location of the file is included in the File ID returned to the user.

Distributed memory

Memory is the new hard disk, and the hard disk is the new tape - Jim Gray


Hazelcast is a distributed memory solution for Java that provides a wealth of features. Implemented such as distributed Java collection classes, distributed locks, distributed ExecutorService implementation and so on. But the reality is often cruel, and Hazelcast has a lot of flaws in practical applications. See "Hazelcast's Daddy" for details


Memcached is an established "distributed" caching solution. The quotation marks are added to the distribution because the Memcached server itself does not support distributed, and each node on the server does not communicate with each other. Distributed support needs to be implemented by the client. Early memory distribution was achieved through replication between nodes, but this approach limited scalability. This is also the reason why distributed memory solutions such as Terrecotta have not become mainstream.



Distributed database

Relational Database

In large-scale distributed applications, single database or simple read-write separation can no longer meet the requirements, so the database must be divided horizontally and vertically, and the database and tables must be divided horizontally and vertically. After the database is divided into tables, the application of access to the database is no longer a simple matter. When an application performs a database operation, the address and table name of the corresponding database must be obtained through some logical operation. For example, User data with IDs from 1 to 1,000,000 is in the User_1 table of database 1, User data with IDs from 1,000,001 to 2,000,000 is in the User_2 table of database 1, and other User data will be in different tables in different databases . At the same time, we must also consider the problem of separation of read and write in the master and slave database. This way of using the database will make data operations extremely complicated, and will also increase the difficulty of data migration and capacity expansion.

For such a complex problem, it is obviously inappropriate to rely on the application to solve it. Therefore, Internet vendors, the major users of distributed applications, have implemented corresponding solutions themselves. These solutions can be divided into an intermediary approach and a framework approach. The former acts as an agent for database access, making distributed databases transparent to applications. The latter is embedded in the application as a framework and can also play a similar role. These two methods have their own advantages and disadvantages, and they are suitable for different occasions.

Sogou Compass, Ali TDDL, Cobar


Although most NoSQL is friendly to distributed support, this does not mean that a cluster can be easily implemented using these NoSQL databases. For example, the famous Key/Value database Redis. It has no official cluster solution before 3.0, so each large-scale use of Redis needs to implement distributed solutions by itself, such as Twitter's Twemproxy, pea pod's Codis, and so on.

When implementing a distributed solution for data, there is an algorithm that is most commonly used-the consistent hash algorithm. This is just a brief mention without further introduction.

Virtualization technology

Keywords: OpenStack, Docker, container technology Virtualization technology is an important means to improve hardware utilization. Virtualization technology is an important technology to realize cloud computing. The bottom layer of virtualization technology is the virtualization of various hardware, such as CPU virtualization, memory virtualization, storage virtualization, network virtualization, and so on. Then based on these technologies, construct various virtual machine technologies. Then various vendors build platforms and software products such as IaaS, PaaS, and SaaS based on virtual machine technology and other virtualization technologies.


OpenStack is an open source project that contains a series of components used to build IaaS platforms. These components include Neutron for network virtualization, Ceph and Swift for storage virtualization, and many components that provide functions such as image management and control panel. OpenStack itself does not provide virtualization technology, but supports many existing virtualization technologies, such as KVM, and provides a series of technologies for building IaaS solutions on top of this. The components in OpenStack can be used together flexibly, and because of open source, users can customize or re-develop them. But also for this reason, any vendor who wants to successfully use OpenStack must have a strong technical team to back it up. This is also the biggest difficulty encountered in the development of OpenStack technology.


Strictly speaking, Docker is not a virtualization technology, but because Docker can provide users with a lightweight virtual machine experience, Docker is also listed here. Docker is a container technology. With the support of the Linux kernel, different processes can be isolated from each other and limited resources, thereby fulfilling the need for part of the virtual machine resource isolation. Compared with virtual machines, Docker has the advantages of light weight and system resource utilization efficiency close to physical machines. Because now with the development of demand and technological progress, server-side applications are developing in a lighter and more distributed direction. Heavyweight technologies such as virtual machines are no longer suitable for small and many applications. This is also an important reason why container technologies such as Docker have developed rapidly and become hot in recent years.

Docker and the aforementioned OpenStack are two different levels of technology, and the two do not conflict. Now the OpenStack and Docker communities are working closely together (containers will not replace OpenStack, but how can the two be deeply integrated?).

Load balancing of distributed systems


HAProxy is a high-performance TCP and HTTP reverse proxy proxy and load balancing server. Reverse proxy and load balancing are available as well as Nginx. Niginx prefers the HTTP protocol. In addition, Varnish and Squid can be used as front-end agents, but they are more caching functions

One more level

Service orchestration: registration, discovery, and routing

Combined technology: configuration management, remote call, etc.

Something similar to JNDI in the early years. That is, when an application accesses another application remotely, it only needs to know the name and version of the application it wants to access, and the call can be successful. There is no need to consider the specific address of the application it wants to call.

Cloud operating system

Combined technology: virtual machine, container technology, network virtualization, configuration management, message queue

Apache Mesos, Google Berg, Tencent Gaia, Baidu Matrix


As mentioned above, the above technologies are all related to you and me, or have similar design ideas. Mastering them, basically not using them, will also be of great benefit to your design and development capabilities.

This article is reproduced from www19 51CTO blog, the original link:, if you need to reprint, please contact the original author by yourself