80% of the code has been submitted by one person, how did Apache ShardingSphere graduate from ASF and promoted to TLP

80% of the code has been submitted by one person, how did Apache ShardingSphere graduate from ASF and promoted to TLP

On April 16, the Apache Software Foundation (ASF) announced that Apache ShardingSphere had graduated and became the Apache Top Level Project (TLP). This is also the first distributed database middleware project of ASF.

ASF is the world s largest open source software foundation. It is a non-profit organization set up specifically to support open source software projects. It provides organizational, legal and financial support. Among the Apache projects and sub-projects it supports, the Software products are under the Apache License (Apache License). According to its August 2019 financial report, ASF had 332 top-level projects and 47 incubation projects. After graduating from the incubation project, there is a chance to become a top-level project.

This time, Apache ShardingSphere took 17 months from becoming an incubator project to becoming a top-level project. During the period, the upgrade of the product from the sub-database sub-table middleware to the distributed database ecological platform was completed. In terms of community operations, from 30+ contributors before the incubation, there are now more than 120 contributors.

However, before entering the incubator, Apache ShardingSphere also experienced a stage where the community was not active enough and the code was submitted by almost one person. For this reason, the Apache ShardingSphere team also made a lot of efforts to become an ASF incubator project. Later, with the help of mentors, they continued to improve the development of the community.

As the first Chinese team project to become a top ASF project in 2020, the development of Apache ShardingSphere and its incubation experience may provide experience for more projects. Therefore, Open Source China invited Zhang Liang, VP of Apache ShardingSphere project, to share the relevant situation.

Guest profile

Zhang Liang, JD Digital Technology Center Architecture Expert, VP & Founder of Apache ShardingSphere Project.

Love open source, good at Java-based distributed architecture, and respect for elegant code. At present, the main energy is invested in building the distributed database middleware Apache ShardingSphere into the industry's first-class financial-grade data solution.

Personal GitHub: github.com/terrymanu

Apache ShardingSphere:

Cross the river by feeling the stones

"Looking back now, 2015 was the early stage of the outbreak of the controversy of a hundred schools of thought in the database."

Zhang Liang remembers that the big data solutions based on Hadoop and the alternatives to NoSQL based relational databases at that time began to gradually cool down after the praise continued.

At the same time, containers and microservices are the hottest topics. Docker is becoming more and more widely recognized. Now the world-dominant Kubernetes also released version 1.0 that year. Papers related to distributed databases such as Spanner and F1 released by Google a few years ago have received more and more attention after a period of precipitation and fermentation. Many people have begun to explore the things after servicization, that is, how data should be stored in a distributed and cloud-native manner. "In that era of innovation, more people are willing to put their energy into servicing cloud-native.

The distributed database NewSQL came into being, but the new solution has not matured so quickly. Until June 2016, Andrew Pavlo and Matthew Aslett published a paper "What's Really New with NewSQL", dividing NewSQL into 3 categories: New Architecture, Transparent Sharding Middleware and Cloud Database (Database-as-a-Service). The development path of NewSQL has only begun to become clearer.

In 2015, Zhang Liang and his team decided to use Java's JDBC layer to weave code to minimize development costs. They developed an open source distributed database intermediate project-that is, Sharding-JDBC (the predecessor of Apache ShardingSphere). ), I didn't realize that the middleware of sub-database and sub-table is actually a form of NewSQL.

After its release, Sharding-JDBC has been continuously updated with market demand, completing the transformation from ShardingSphere to Apache ShardingSphere. The product itself has evolved from a Java development framework with sub-databases and tables to a distributed database ecosystem. (Hereinafter, the project name will be referred to as ShardingSphere)

In addition, taking the open source route from the beginning is a choice based on a lot of research. Zhang Liang s team survey found that although the existing projects at that time were able to solve problems in a certain field, they did not yet have a sufficiently mature product. What worries them more is the possibility of sustainable development of the project. In the surveyed projects, without exception, they stopped updating after a period of time, or greatly slowed down the pace of updating. These projects did not generate enough active communities to resonate with developers to drive the long-term development of the project.

Zhang Liang believes that Internet companies are often busy and demand emerges endlessly. Therefore, the development plan of the ShardingSphere project is constantly adjusted according to demand and spirals upward. The open source project can receive the needs of all sides, far from the company's internal projects can compare. These requirements outside the company can continue to improve ShardingSphere. For example, the upcoming 5.0 version is intended for diversified needs.

Zhang Liang:

Diversified requirements have enabled ShardingSphere to gradually establish the core concept of the project-a pluggable platform, which will officially debut in the upcoming 5.0 version.

The launch of the pluggable platform means that ShardingSphere has evolved from a solution provider to a platform-level application that can be freely expanded. In its core design, pluggable components will be divided into technical components and functional components.

Technical components include: SQL dialects, transaction types, registry types, etc.; compared to easy-to-understand technical components that are pluggable, pluggable functional components are even more exciting. It means that all the functions of ShardingSphere, such as sharding, desensitization, distributed transactions, read-write separation, distributed governance, flexible scheduling, etc., will be able to be added or deleted from the platform in the form of plug-ins. The functions are completely independent, and if multiple functions are used at the same time, the pluggable platform is responsible for the superposition of capabilities.

The pluggable platform can even run with a blank skeleton without adding any functions, providing engineers with the infrastructure to develop new plug-ins.

I am very optimistic that the pluggable platform will make it easier for contributors to integrate into the community and provide ShardingSphere with more diversified capabilities, such as: multiple copies, SQL auditing, multi-mode heterogeneous data, HTAP, etc.

Pluggable platform design:

Of course, diverse needs will increase the difficulty of work. Zhang Liang revealed that how to clarify the priority of needs outside the company is difficult for students who develop ShardingSphere at JD Digital. For example: whether to use XA or flexible transaction as the transaction processing engine of ShardingSphere; whether to provide an interface beyond the company's needs; how to deal with community needs such as shadow gauge pressure testing that the company does not use temporarily...

"Squat guard" ASF foreign tutor

Embarking on the road to open source, in addition to the choices and problems faced by internal personnel, how to mobilize the community and make the project develop lasting is also an urgent problem to be solved.

ShardingSphere was first open sourced on January 17, 2016. In the next two years, Zhang Liang submitted more than 80% of the project's code alone, which supported the initial development of the project, but also "limited" the growth of the community of an open source project.

"If ShardingSphere is always my personal open source project, or an open source project affiliated with a company, then the contributing engineers will inevitably have concerns. They have reason to worry that their contribution may be due to my personal preferences or the company's business plan. Changes are in vain. In Zhang Liang s view, there is always a lack of excellent and ideal engineers in China, but a lack of bullseye.

If the project can enter the ASF, then the change in the spiritual attributes of the project will be higher than the change in the code itself. All contributions of project participants will be permanently stored in the ASF contribution list and will be publicly displayed to every programmer in the world, and there is no need to worry about ASF suddenly declaring the project closed source. "I hope that after joining ASF, ShardingSphere can break the psychological guard of users and stimulate the resonance of contributors. The existence of the foundation is the most powerful guarantee for the openness and sustainable development of the project."

On November 10, 2018, ShardingSphere entered the ASF incubator. As early as May 2018, Zhang Liang's team was already preparing to enter the ASF incubator.

In addition to the basic provision of English documents, the Issue is changed from Chinese to English, and the tasks should be prepared by Issue-driven as much as possible. Zhang Liang's team also faces two challenges: 1. it needs to add the project's community driving attributes to make it more in line with the Apache Way. The second is to "stay on" the foreign tutors of ASF and strive for face-to-face communication opportunities in order to get help from tutors to enter the incubator.

According to ASF regulations, to enter its incubator, at least one mentor's recommendation is required. Generally, projects entering the incubator have at least 3 mentors' help. At the same time, the community itself has a willingness to develop towards the Apache Way.

The Apache Way is considered to have no "single model", it can be seen as the basic principle that the project management committee needs to follow. The Apache project and its community focus on the activities required at a specific stage in the project life cycle, including cultivating the community, developing excellent code, and building awareness. ASF insists that "community is better than code" and firmly believes that a strong community can always correct problems in the code. On the contrary, it may be difficult for an unhealthy community to maintain the code base continuously.

But at that time, ShardingSphere community participation was not high. In Zhang Liang s words: The project itself is not driven by the community. A project controlled by a lone ranger obviously does not conform to the Apache Way. So, ShardingSphere began to encourage community contributors. Get involved. After possessing the basic elements of the Apache Way, the way to find a mentor gradually becomes smoother.

Zhang Liang believes that in terms of mentors, face-to-face communication is easier to build trust than email. In 2018, there are very few Chinese with ASF mentor status. Therefore, it is very important and relatively difficult to find foreign mentors. We adopt a strategy of collecting and tracking foreign mentors journeys to China for a long time, and strive to establish more face-to-face communication with mentors. Opportunity."

When ShardingSphere entered the Apache incubator, there were three mentors and one Champion, and one of the foreign mentors and the foreign Champion were met by Zhang Liang while attending the conference.

Zhang Liang:

When attending the HDC meeting, I met Craig L Russell, who was still the Secretary-General of ASF at the time, and he was willing to be the mentor of the ShardingSphere project; I also met Roman Shaposhnik, the senior god of ASF, who was willing to become the champion of the project.

The other two mentors are from China, they are Jiang Ning, a senior Apache member of the Apache ServiceComb community, and Feng Jia, a senior Apache member of the Apache RocketMQ community. In this way, with the help of three mentors and a Champion, we successfully entered the Apache incubator.

In December 2018, the then chairman of the Apache Foundation and mentor Craig visited JD Digital's friends:

The mentor is the key person for the project to enter the incubator. Its main task is to make the project's PPMC understand the rules of Apache. The main responsibility of the PPMC is version release and submitter election. Zhang Liang further explained this model: "The mentor's guidance is limited to let the incubation project implement the Apache Way, not the project's own technology and product development path. The roadmap of the project is jointly determined by PPMC and the submitter.

Zhang Liang:

Apache's open source philosophy believes that all projects that conform to the Apache Way will eventually be well developed. Therefore, the mentor, PPMC, and submitter can perform their duties.

The contribution of the project is not only the code, it is diversified, but also includes e-mail discussion, resolution, documentation, answering questions, sharing, operation, etc., any contribution is recognized. In addition to the formation of the community, legal compliance is also crucial. The instructor will guide PPMC to complete the project's license, brand, and intellectual property compliance.

The instructor is an Apache Way expert, not a technical expert on the project.

During the incubation process, the mentors provided a lot of help and guidance to the ShardingSphere community. Including the guidance of the Apache Way, the policy of community expansion, the use of mailing lists, incubation reports, whimsy, JIRA and other infrastructure, the process of Apache version release and the sorting of licenses, the election of contributors and PPMC, etc.

With the development of the community, the idea that the community is better than the code is more and more truly recognized by everyone.

Communicating with mentors is also a key part of community operations. "The activity of the mailing list is an important measure of the community." Zhang Liang said, as a PPMC, he communicates with foreign tutors through emails; when communicating with domestic tutors, simple question consultation is mainly based on WeChat, discussion and resolution events , Regardless of country, use dev@shardingsphere.apache.org mailing list to complete communication.

ShardingSphere mailing list interception:

Don t even think about graduating if the community is not well run

Building an active and sticky community was regarded by Zhang Liang as the first task and greatest expectation in the Apache incubator.

From the results, before entering the incubator, there were more than 30 contributors to ShardingSphere, and the current number of contributors to the project has reached more than 120.

In addition, with the rapid increase in the number of contributors, the number of issues, PRs, code submissions, and even the number of stars have exploded. Zhang Liang said that as more and more high-quality contributors joined, the ShardingSphere community also followed the philosophy of the Apache community and voted for more than a dozen official committers with Apache accounts, "creating an active and diversified community, This is the most important achievement of ShardingSphere in the Apache incubation stage."

In Zhang Liang's view, the biggest advantages of Apache incubator are two: one is that it can attract more contributors to participate; the other is that it can provide users with more assured services.

Attracting contributors to participate. When ShardingSphere entered the Apache incubator, it was basically developed by the engineers of JD Digital, and it is difficult for external forces to truly participate. After entering the incubator, China Telecom Yipay, DaoCloud and some individual contributors successively participated in the project and contributed the project's logo, UI interface, shadow gauge pressure measurement function and flexible migration prototype to ShradingSphere.

"A foundation that is neutral to the company can break down the boundaries of technology enthusiasts at the commercial level, and work closely together." Zhang Liang commented.

For users, since the Apache Foundation's project must fully comply with the Apache license, the project itself and all its dependencies must be compatible with the Apache protocol. The Apache protocol is commercial-friendly, so when using and relying on the projects of the Apache Foundation, there is no need to worry about the risks of GPL and other open source communication protocols to the company's business activities. And as the project belongs to the foundation, users do not need to worry about the possibility of stopping updates brought about by the personal control of the project community, reducing the risk of technology selection. At the same time, Zhang Liang said that the project's full compliance with the Apache license is also the difficulty of Apache Release.

Zhang Liang cites what he thinks is the most important point in the incubator process-how to encourage contributors to continue participating in the community.

Zhang Liang:

ShardingSphere is a project with a high entry barrier. Its core modules such as data sharding, SQL parsing, and NIO-based database protocols are difficult to directly face public developers, and it is relatively steep to attract new contributors to become committers.

Therefore, we have deliberately opened the novice task to attract more contributors, and converted the product positioning of ShardingSphere from database and table middleware to a distributed database platform to absorb more community contributions and make it an organic combination Together.

In addition, in the initial stage of community development, we reduced the difficulty of submitting the election. When voting, PPMC pays more attention to potential and enthusiastic contributors. As the current maturity of the community increases, the difficulty of submitting the election will also increase.

The result of community operation also directly affects the smooth graduation of the project. Apache has a fixed project maturity evaluation model to evaluate whether the project can meet the graduation requirements. The model consists of 7 large projects including code, copyright, release, quality, community, consistency resolution and dependence. Each large project has 2-7 small evaluation items.

The evaluation report can be found at: github.com/apache/shar...

After graduating from ASF, the project either independently becomes a top-level project or becomes a sub-project of other top-level projects. Last year, Apache SkyWalking led by the Chinese team graduated from the ASF incubator and became a top-level project. At present, including ShardingSphere, there are 10 Chinese team projects graduated from ASF, including: Apache CarbonData, Apache Dubbo, Apache Eagle, Apache Griffin, Apache HAWQ, Apache Kylin, Apache RocketMQ, Apache ServiceComb.

The key results of ShardingSphere's successful graduation and promotion to top projects are: active community formation, evaluation of project success, and brand legal compliance. In addition, proficiency in project release and submitter election are also the two most important tasks in the incubation stage.

Regarding the graduation and incubation cycle, Zhang Liang said, it depends on the status of the project and the community itself, and there is no horizontal comparability. Some projects can be graduated within 8 months, and some have not graduated in a few years. During the incubation process, the project can use the infrastructure provided by the Apache Foundation, but it is not fully recognized by the foundation and can withdraw from the foundation through negotiation at any time.

As far as the incubation cycle of the ShardingSphere project is concerned, Zhang Liang believes that the completion of graduation incubation in 17 months has exceeded his expectations. Among them, mentors and community members have played a key role. "It is worth mentioning that during the project incubation process, some mentors His identity with PPMC has changed. Craig L Russell was elected chairman of the Apache Foundation during this period; PPMC Wu Sheng was qualified as a mentor due to his active and graduated another project, Apache SkyWalking, which led to the incubation of ShardingSphere. The process is smoother."

"From my personal point of view, the product is less than 50% complete"

"The project's graduation from the Apache incubator only means that the project's community operation has been recognized by ASF." Zhang Liang said that for the product itself, Apache ShardingSphere is still far from the ultimate goal. "From my personal point of view, the completion of the product should be less than 50%."

In the one and a half years of the incubator, Apache ShardingSphere has carried out product iterations, transforming from a sub-database and sub-table middleware to a distributed database ecological platform, and has successively released 5 versions, namely 4.0.0-RC1 and 4.0.0-RC2 , 4.0.0-RC3, 4.0.0 and 4.0.1.

Zhang Liang:

The first 3 versions are all release candidates before the stable version of 4.0.0. The main force in the incubator stage is the release of the 4.0.0 stable version. The main functions include SQL92 dialect, PostgreSQL proxy, XA and flexible transaction support, integration of transaction frameworks such as Seata, data desensitization, SkyWalking plug-in, UI and other functions.

Apache ShardingSphere core function diagram:

However, in the planning of Zhang Liang and his team, Apache ShardingSphere still has many things to do:

  • The short-term goal of the project is to create a pluggable platform to manage the supported components and functions, but the pluggable platform is currently under development, and its full version will be available in the 5.x version of Apache ShardingSphere. .
  • In the near future, Apache ShardingSphere will enter the field of database kernel exploration and build a comprehensive distributed database system. The JDTX design idea announced last year is an exploration of the project in terms of distributed transactions. In the future, while continuously improving JDTX, the project will also explore the query optimizer and plan to provide more diversified solution options such as KV on the storage side.
  • In addition, the project team also hopes to build a distributed database governance and scheduling system. The database is responsible for the underlying storage in the Apache ShardingSphere system, and the pluggable platform completely takes over the sharding, computing, transaction, governance, and scheduling capabilities in the distributed scenario.
  • The ultimate goal is to hope that Apache ShardingSphere can manage all the data storage terminals included in the system in a cloud-native way like a Kubernetes management container.
  • At the company level, the broader goal of Apache ShardingSphere is to carry all the capabilities of the JD Digital T1 platform database. At present, Apache ShardingSphere is the core middleware product of the distributed database in the T1 financial digital solution technology. Based on this product and team, JD Digital is building a pluggable, autonomous and controllable financial-level distributed relational database.

Because of this, the Apache ShardingSphere project team has been open to recruiting data R&D engineers for a long time. It is hoped that more technical talents with ideals and open source vision will join in, combining work and interest to create a better open source community.

Finally, Zhang Liang also introduced the current use of Apache ShardingSphere. More than 130 companies have officially registered and used it. "We have no statistics on the companies actually used, but from the situation of the WeChat group answering questions, the unregistered adoption companies The number is very impressive." See: shardingsphere.apache.org/community/c...

As the project graduated from the Apache incubator, Zhang Liang also hopes to do more promotion in overseas markets. "Although the project has not yet been adopted by foreign companies, many foreign contributors have appeared on GitHub Issue and PR." In Zhang Liang's view, in addition to necessary publicity and promotion, domestic projects must be integrated into today's mainstream open source system in addition to necessary publicity and promotion. Large and comprehensive projects are not easy to gain recognition in overseas markets. Focusing on the project's own field and cooperating with other open source projects to solve problems is the key to whether the overseas market can be recognized.

Now that ShardingSphere has graduated from the incubator, Apache is no longer a help provider for ShardingSphere, but ShardingSphere is a part of Apache-fully owned by the Apache Foundation and become its official project. "The change of identity will attract more Of outstanding open source contributors participated." Zhang Liang said.