Mysql from entry to enthusiasm (6) query performance optimization

Mysql from entry to enthusiasm (6) query performance optimization


The text has been included in my GitHub repository. Welcome to Star: The
best time to plant a tree is ten years ago, followed by now.
I know that many people don t play QQ anymore, but for nostalgia, welcome to join the sixth Pulse Magic Sword Java rookie learning group, group chat number: 549684836 encourage everyone to blog on the road of technology


We continue to explore mysql. Earlier we learned some basic knowledge of mysql index, today we come to Kangkang B+ tree index

The above chapter is the basics and talks about the single-table query method. If you haven't read it, please move to the previous chapter.

Slow query basics: optimizing data access

The main reason for the low query performance is that we have too much data to query. Your report query is inherently very slow. For inefficient queries, we may refer to the following 2 steps.

  • Determine if the application is retrieving more data than needed.
  • Determine whether the mysql server layer analyzes a large amount of data more than needed.

Does it request unneeded data like a database?

There are a lot of SQL, requesting more data than actually needed, which will bring additional burden to the server and increase network overhead. For example, the following case is a bad case

  • For example, you only need to return 2 columns but you use *. This situation should not be.
  • The second is that, for example, you need to query 10 pieces of data, but you really did not use limit to limit the number of queries.

Whether to scan additional records

After determining and affirming the number of rows we need to return, the next thing to optimize is to see if too much data is queried. For mysql, the three simplest indicators to measure query overhead are as follows:

  • Response time
  • Scan lines
  • Number of rows returned

Scan lines and return lines

If we find that we scan too many rows, but we really only return a small number of rows, this optimization method has the following aspects.

  • Use index covering scan to put all the required columns in the index, so that the storage engine can return the result without going back to the table to get the corresponding row
  • Change the library table structure to use a separate summary table
  • Rewrite this complex sql and use the logic of the code to split the sql.

Refactor query method

When optimizing the query problem, our goal is to find a way, as long as the result is the same as the original result, and then a method with less time overhead. In the past, we always emphasized the need to complete as much work as possible at the database level. This logic In the past, it was believed that the reason for this was that they thought that the network overhead would be high, but these are not applicable to mysql. The connection and disconnection of mysql are very lightweight, so small queries are king.

One complex query or multiple simple queries

I recently took over a project, and the SQL of my former colleague was full of logic. As a result, the performance of the system was extremely slow. Then after we took over, we did a wave of optimization and split it all into multiple simple queries. Although the number of connections was increased, our query efficiency and code reusability I don t know how many times the scalability is stronger, so it depends on the situation. If it can be optimized into multiple simple queries, try to optimize it as much as possible.

Split query (divide and conquer)

Take batch update as an example. For example, if I modify a chapter of a topic, I need to modify the chapter of a user who was a previous user of this topic, then I need to update in batches. If you update so much data at once, will it be locked? A lot of data leads to big transactions. If I split it into multiple updates, it will be much better to update 1k each time.

Decompose associative query

Many high-performance applications will decompose related queries. Simply, you can perform a single-table query on each table, and then correlate the results at the application layer, such as the following example

We can actually decompose it into the following SQL

Why do you want to do this. What are the benefits of doing this

  • After splitting the query, executing a single query can reduce lock contention
  • Decomposition at the application layer makes it easier to split the database and achieve high-performance expansion
  • The performance of the query itself will also increase
  • Split queries can reduce redundant queries

Related subquery optimization

In fact, many related subqueries are very bad, the worst is the subquery where is followed by in. Generally, we recommend using left join to implement such subqueries.

Optimization of the maximum and minimum values,

In many cases, we will use functions to find the maximum and minimum values of a field, but this situation is not the best. For example, in some scenarios, we can know if the query field is an index, then The index itself is sorted, so we only need to sort it, and limit 1 is much better than using functions.

Optimization of count

Many bloggers say that count( ) is not needed at the end . In fact, they are wrong. In fact, the best way to query the number of rows in the result is count( ), because the bottom layer has optimized it.

Sort by efficiency, count(field)<count(primary key id)<count(1) count( ), so I suggest you use count( ) as much as possible .

Principle of connection

One concept that cannot be avoided when engaging in a database is Join, which means connecting when translated into Chinese. I believe that many small partners are a little confused when they first learn to connect. After understanding the semantics of the connection, they may not understand how the records in each table are connected, so that they often fall into the following two misunderstandings when using:

  • Misunderstanding 1: Business is supreme. No matter how complicated the query is, it can be done in a join statement.
  • Misunderstanding 2: Stay away from it. The slow query reported by the DBA last time was caused by the use of a connection, and I will never use it anymore.

The concept of inner connection and outer connection

  • For the two tables of inner join, the record in the driving table cannot find a matching record in the driven table. The record will not be added to the final result set. The connections we mentioned above are all so-called inner joins.
  • For the two tables of the outer join, the records in the driving table need to be added to the result set even if there is no matching record in the driven table.
    • In MySQL, depending on the selection of the driver table, outer joins can still be subdivided into 2 types:-left join-right join

But this still has a problem. Even for outer joins, sometimes we don't want to add all the records of the driver table to the final result set. This makes it difficult. Sometimes the match fails to be added to the result set, and sometimes it is not added to the result set. This is a bit worrying. . . Hey, dividing the filter conditions into two types solves this problem, so the filter conditions placed in different places have different semantics:

  • Filter conditions in the WHERE clause

    • The filter condition in the WHERE clause is the kind we usually see. Regardless of whether it is an inner join or an outer join, all records that do not meet the filter condition in the WHERE clause will not be added to the final result set.
  • Filter conditions in the ON clause

    • For the records of the externally connected drive table, if a record that matches the filter condition in the ON clause cannot be found in the driven table, the record will still be added to the result set, corresponding to each field of the driven table record Fill it with NULL values.

Principle of connection

Nested-Loop Join

As we said before, for a two-table connection, the driving table will only be visited once, but the driven table has to be visited many times. The specific visits depend on the result set after a single table query is executed on the driving table. The number of records. For inner joins, it does not matter which table is selected as the drive table, while the drive table of the outer join is fixed, that is to say, the drive table of the left (outer) connection is the one on the left, and the drive table of the right (outer) connection It's the table on the right. We have already introduced the general process of executing inner join queries on the t1 table and t2 table above. Let s review:

  • Step 1: Select the drive table, use the filter conditions related to the drive table, and select the single table access method with the lowest cost to execute the single table query on the drive table.
  • Step 2: For each record in the result set obtained by querying the driving table in the previous step, go to the driven table to find a matching record.

If there are 3 tables to be connected, then the result set obtained in step 2 is like a new driving table, and then the third table becomes the driven table. Repeat the above process, which is the result set obtained in step 2. For each record of, you need to look for a matching record in the t3 table. Use pseudo code to show that this process is like this:

for each row in t1 {   # t1 
    for each row in t2 {   # t1 t2 
        for each row in t3 {   # t1 t2 t3 
            if row satisfies join conditions, send to client

Use index to speed up connection

For example, create an index on the driven table and let their query type be cast or ref. This can speed up the query

Block Nested-Loop Join

Draw out a piece of memory (join buffer) in advance to store the records in the result set of the drive table, and then start scanning the driven table. Each record of the driven table matches the multiple drive table records in this memory at a time, which can significantly reduce the number of records in the drive table. Drive the I/O operation of the table.


We will continue to fight in the next chapter.

Daily likes

Alright guys, the above is the whole content of this article. The people who can see here are all real fans .

It is not easy to create. Your support and recognition is the biggest motivation for my creation. See you in the next article

6.Meridian Sword | Article [Original] If there are any errors in this blog, please criticize and advise, I am very grateful!