Mysql optimization specification recommendations

Mysql optimization specification recommendations

Database command specification

  • All database object names must use lowercase letters and be separated by underscores
  • It is forbidden to use MySQL reserved keywords for all database object names (if the table name contains keywords for query, it needs to be enclosed in single quotes)
  • The naming of database objects should be able to be recognized by name, and at the end should not exceed 32 characters
  • Temporary database tables must be prefixed with tmp_ and suffixed with date, and backup tables must be prefixed with bak_ and suffixed with date (time stamp)
  • All column names and column types that store the same data must be consistent (usually as associated columns, if the associated column types are inconsistent during query, the data type will be automatically converted implicitly, which will cause the index on the column to fail and reduce the query efficiency)

Basic database design specifications

  1. All tables must use the Innodb storage engine

Without special requirements (that is, functions that Innodb cannot meet, such as column storage, storage space data, etc.), all tables must use the Innodb storage engine (Myisam is the default before mysql5.5, and Innodb is the default after 5.6).

Innodb supports transactions, row-level locks, better recoverability, and better performance under high concurrency.

  1. The character set of the database and the table uniformly use UTF8

Compatibility is better. The unified character set can avoid garbled characters caused by character set conversion. The conversion of different character sets before comparison will cause index failure. If there is a need to store emoji expressions in the database, the character set needs to use the utf8mb4 character set .

  1. All tables and fields need to be commented

Use the comment clause to add remarks for tables and columns, and maintain the data dictionary from the beginning

  1. Try to control the size of the single table data, and it is recommended to control it within 5 million.

5 million is not the limit of Mysql database. Excessive meeting will cause big problems in modifying the table structure, backup and recovery.

You can use historical data archiving (applied to log data), sub-database and sub-table (applied to business data) to control the amount of data

  1. Use Mysql partition table with caution

The partition table is physically represented as multiple files, and logically represented as a table;

Choose the partition key carefully, the efficiency of cross-partition query may be lower;

It is recommended to use a physical sub-table approach to manage big data.

  1. Try to separate hot and cold data, reduce the width of the table

Mysql limits each table to store up to 4096 columns, and the size of each row of data cannot exceed 65535 bytes.

Reduce disk IO to ensure the memory cache hit rate of hot data (the wider the table, the larger the memory occupied when the table is loaded into the memory buffer pool, and more IO will be consumed);

Use the cache more effectively to avoid reading useless cold data;

Columns that are often used together are placed in a table (to avoid more association operations).

  1. It is forbidden to create reserved fields in the table

The naming of reserved fields is difficult to identify by name.

The reserved field cannot confirm the type of data stored, so the appropriate type cannot be selected.

Modifications to the reserved field type will lock the table.

  1. It is forbidden to store large binary data such as pictures and files in the database

Usually a large file will cause a rapid increase in the amount of data in a short period of time. When the database reads the database, a large number of random IO operations are usually performed. When the file is large, the IO operation is time-consuming.

Usually stored in the file server, the database only stores the file address information

  1. Prohibit online database stress testing

  2. It is forbidden to directly connect to the generation environment database from the development environment and the test environment

Database field design specification

  1. Preferentially select the smallest data type that meets storage needs

the reason:

The larger the column field, the larger the space required for indexing, the smaller and smaller the number of index nodes that can be stored in a page, and the greater the number of IOs required during traversal. , The performance of the index is also worse.

method:

1. Convert a character string to digital storage, such as converting an IP address into plastic data

MySQL provides two methods to deal with ip addresses

  • inet_aton converts ip to unsigned integer (4-8 bits)

  • inet_ntoa converts integer ip to address

Before inserting the data, first use inet_aton to convert the ip address to an integer, which can save space. When displaying data, use inet_ntoa to convert the integer ip address to an address display.

2. For non-negative data (such as self-incrementing ID, integer IP), unsigned integers should be used for storage first

the reason:

Unsigned can double the storage space compared to signed

SIGNED INT -2147483648~2147483647
UNSIGNED INT 0~4294967295
 

The N in VARCHAR(N) represents the number of characters, not the number of bytes. UTF8 is used to store 255 Chinese characters Varchar(255)=765 bytes. Excessive length will consume more memory.

  1. Avoid using TEXT, BLOB data types, the most common TEXT type can store 64k of data

1. It is recommended to separate BLOB or TEXT columns into a separate extended table

Mysql memory temporary table does not support large data types such as TEXT and BLOB. If the query contains such data, the memory temporary table cannot be used in operations such as sorting, and the disk temporary table must be used. And for this kind of data, Mysql still has to perform a second query, which will make sql performance very poor, but it does not mean that such data types must not be used.

If you must use it, it is recommended to separate the BLOB or TEXT column into a separate extended table. Do not use select * when querying, but only need to retrieve the necessary columns. Do not query the column when you do not need the data in the TEXT column.

2. TEXT or BLOB type can only use prefix index

Because MySQL has restrictions on the length of the index field, the TEXT type can only use the prefix index, and there can be no default value on the TEXT column

  1. Avoid using ENUM type

To modify the ENUM value, you need to use the ALTER statement

ENUM type ORDER BY operation is inefficient and requires additional operations

It is forbidden to use numeric values as enumeration values of ENUM

  1. Define all columns as NOT NULL as much as possible

the reason:

Index NULL column needs additional space to save, so it takes up more space

Special treatment should be done on NULL values when performing comparisons and calculations

  1. Use TIMESTAMP (4 bytes) or DATETIME type (8 bytes) to store time

TIMESTAMP storage time range 1970-01-01 00:00:01 ~ 2038-01-19-03:14:07

TIMESTAMP occupies 4 bytes the same as INT, but it is more readable than INT

Use DATETIME type storage beyond the TIMESTAMP value range

Often people use strings to store date-based data (incorrect practice)

  • Disadvantage 1: Cannot calculate and compare with date functions

  • Disadvantage 2: Using a string to store dates takes up more space

  1. Amount data related to finance must use decimal type
  • Non-precision floating point: float, double

  • Precision floating point: decimal

The Decimal type is a precision floating point number, which will not lose precision when calculating

The occupied space is determined by the defined width, every 4 bytes can store 9 digits, and the decimal point takes up one byte

Can be used to store integer data larger than bigint

Index design specification

  1. Limit the number of indexes on each table, it is recommended that the index of a single table does not exceed 5

Indexes are not as many as possible! Indexes can improve efficiency and can also reduce efficiency.

Indexes can increase query efficiency, but it will also reduce the efficiency of inserts and updates, or even in some cases, reduce query efficiency.

Because when the mysql optimizer chooses how to optimize the query, it will evaluate each index that can be used according to the unified information to generate a best execution plan. If there are many indexes at the same time, they can be used for the query. It will increase the time it takes for the mysql optimizer to generate an execution plan, and it will also reduce query performance.

  1. It is forbidden to create a separate index for each column in the table

Before version 5.6, one SQL can only use one index in one table. After 5.6, although there is an optimization method for combining indexes, it is still far from using a joint index query method.

  1. Each Innodb table must have a primary key

Innodb is an index-organized table: the logical order of data storage and the order of indexes are the same. Each table can have multiple indexes, but the storage order of the table can only be one.

Innodb organizes the table in the order of the primary key index

  • Do not use frequently updated columns as primary keys, and do not apply to multi-column primary keys (equivalent to a joint index)

  • Do not use UUID, MD5, HASH, string columns as primary keys (the order of data growth cannot be guaranteed)

  • The primary key is recommended to use auto-increment ID value

Common index column suggestions

  • Columns appearing in the WHERE clause of SELECT, UPDATE, and DELETE statements

  • Fields included in ORDER BY, GROUP BY, DISTINCT

  • Do not create an index for all the columns that match the fields in 1 and 2. It is usually better to create a joint index on the fields in 1, 2

  • Associated columns of multi-table join

How to choose the order of index columns

The purpose of indexing is to search for data through the index, reduce random IO, and increase query performance. The less data the index can filter out, the less data will be read from the disk.

  • The highest discrimination is placed on the leftmost side of the joint index (discrimination = the number of different values in the column/the total number of rows in the column)

  • Try to put the column with a small field length on the leftmost side of the joint index (because the smaller the field length, the larger the amount of data that can be stored on a page, the better the IO performance)

  • The most frequently used columns are placed on the left side of the joint index (so that you can build fewer indexes)

Avoid creating redundant indexes and duplicate indexes (increased the time for the query optimizer to generate the execution plan)

  • Examples of duplicate indexes: primary key(id), index(id), unique index(id)

  • Examples of redundant indexes: index(a,b,c), index(a,b), index(a)

For frequent queries, give priority to using a covering index

Covering index: the index that contains all query fields (fields included in where, select, ordery by, group by)

The benefits of covering indexes:

Avoid the secondary query of the Innodb table index

Innodb is stored in the order of the clustered index. For Innodb, the secondary index stored in the leaf node is the primary key information of the row. If the data is queried by the secondary index, after finding the corresponding key value , But also through the primary key for a second query to get the data we really need.

In a covering index, all data can be obtained from the key value of the secondary index, which avoids secondary queries on the primary key, reduces IO operations, and improves query efficiency.

Random IO can be turned into sequential IO to speed up query efficiency

Since the cover index is stored in the order of key values, for IO-intensive range search, it is much less IO than reading each row of data from the disk randomly. Therefore, the cover index can also be used to randomize the disk during access. The read IO is converted to the sequential IO of the index search.

Index SET specification

Try to avoid using foreign key constraints

  • It is not recommended to use foreign key constraints (foreign key), but you must create an index on the associated key between the table and the table

  • Foreign keys can be used to ensure the referential integrity of data, but it is recommended to implement it on the business side

  • Foreign keys will affect the write operations of the parent table and the child table, thereby reducing performance

Database SQL development specification

  1. It is recommended to use prepared statements for database operations

Prepared statements can reuse these plans, reduce the time required for SQL compilation, and can also solve the SQL injection problem caused by dynamic SQL.

Passing only parameters is more efficient than passing SQL statements.

The same sentence can be parsed once and used multiple times to improve processing efficiency.

  1. Avoid implicit conversion of data types

Implicit conversion will cause index failure such as:

select name,phone from customer where id = '111';
 
  1. Make full use of the existing indexes on the table

Avoid using double% search conditions. For example: a like'%123%', (if there is no leading %, only the rear %, the index on the column can be used)

A SQL can only use one column in the composite index for range query. For example, if there is a joint index on columns a, b, and c, and there is a range query on column a in the query condition, the index on columns b and c will not be used.

When defining a joint index, if column a is to be used for range search, column a must be placed on the right side of the joint index, and use left join or not exists to optimize the not in operation, because not in usually uses index failure .

  1. When designing a database, you should consider future expansion

  2. The program connects to different databases and uses different accounts, and cross-database query in hexadecimals

  • Leave room for database migration and sub-database sub-table

  • Reduce business coupling

  • Avoid security risks caused by excessive permissions

  1. SELECT is forbidden * Must use SELECT <field list> query

the reason:

  • Consume more CPU and IO with network bandwidth resources

  • Cannot use covering index

  • Can reduce the impact of table structure changes

  1. Prohibit the use of INSERT statements without a field list

Such as:

insert into values ('a','b','c');
 

Should use:

insert into t(c1,c2,c3) values ('a','b','c');
 
  1. Avoid using sub-queries, you can optimize sub-queries into join operations

Usually the subquery is in the in clause, and the subquery is a simple SQL (not including union, group by, order by, and limit clauses), then the subquery can be converted into an associated query for optimization.

Reasons for poor subquery performance:

Indexes cannot be used for the result set of a subquery. Usually the result set of a subquery is stored in a temporary table. There will be no index for either a memory temporary table or a disk temporary table, so the query performance will be affected to a certain extent. Especially for subqueries that return a relatively large result set, the greater the impact on query performance.

Since the subquery will generate a large number of temporary tables and no indexes, it will consume too much CPU and IO resources, resulting in a large number of slow queries.

  1. Avoid using JOIN to associate too many tables

For Mysql, there is an associated cache, and the size of the cache can be set by the join_buffer_size parameter.

In Mysql, if you join one table for the same SQL, one more association cache will be allocated. If there are more tables associated in a SQL, the larger the memory occupied.

If a large number of multi-table association operations are used in the program, and the join_buffer_size setting is unreasonable, it is easy to cause server memory overflow, which will affect the stability of server database performance.

At the same time, for association operations, temporary table operations will occur, which will affect query efficiency. Mysql allows up to 61 tables to be associated, and it is recommended that no more than 5.

  1. Reduce the number of interactions with the database

The database is more suitable for processing batch operations. Combining multiple identical operations together can improve processing efficiency.

  1. When performing or judgment corresponding to the same column, use in instead of or

The value of in should not exceed 500. The in operation can use the index more effectively, or in most cases, the index is rarely used.

  1. It is forbidden to use order by rand() for random sorting

order by rand() will load all the eligible data in the table into the memory, and then sort all the data in the memory according to the randomly generated value, and may generate a random value for each row, if the conditions are met The data set is very large, it will consume a lot of CPU, IO and memory resources.

It is recommended to obtain a random value in the program, and then obtain the data from the database.

  1. Function conversion and calculation of columns are prohibited in the WHERE clause

When performing function conversions or calculations on columns, indexes cannot be used

Not recommended:

where date(create_time)='20190101'
 

recommend:

where create_time >= '20190101' and create_time < '20190102'
 
  1. Use UNION ALL instead of UNION when it is obvious that there will be no duplicate values
  • UNION will put all the data of the two result sets in the temporary table before performing the deduplication operation

  • UNION ALL will no longer de-duplicate the result set

  1. Split complex big SQL into multiple small SQL
  • Big SQL is logically complex and requires SQL that takes up a lot of CPU for calculations

  • In MySQL, one SQL can only use one CPU for calculation

  • After SQL is split, it can be executed in parallel to improve processing efficiency

Code of Conduct for Database Operation

Batch write (UPDATE, DELETE, INSERT) operations over 1 million rows need to be performed multiple times in batches

1. Mass operations may cause severe master-slave delays

In a master-slave environment, large-scale operations may cause severe master-slave delays. Large-scale write operations generally require a certain amount of time to be executed. Only when the execution on the master library is completed, will it be executed on other slave libraries, so Will cause a long delay between the main library and the slave library

2. A large number of logs will be generated when the binlog log is in row format

Large batch write operations will generate a lot of logs, especially for row format binary data. Since the modification of each row of data is recorded in the row format, the more data we modify at a time, the more logs will be generated. The longer it takes for log transmission and recovery, which is also a reason for the master-slave delay

3. Avoid large transaction operations

Mass modification of data must be performed in one transaction, which will cause a large amount of data in the table to be locked, which will lead to a large amount of blockage, which will have a very large impact on the performance of MySQL.

In particular, long-term blocking will fill up all the available connections to the database, which will make other applications in the production environment unable to connect to the database, so be sure to pay attention to batch write operations.

For large tables, use pt-online-schema-change to modify the table structure

  • Avoid master-slave delay caused by large table modification

  • Avoid locking the table when modifying table fields

  • You must be cautious about modifying the data structure of a large table, as it will cause serious table lock operations, especially in the production environment, which cannot be tolerated.

pt-online-schema-change it will first create a new table with the same structure as the original table, and modify the table structure on the new table, and then copy the data in the original table to the new table, and in the original table Add some triggers. Copy the newly added data in the original table to the new table. After all the data in the row is copied, the new table is named the original table and the original table is deleted. Decompose the original DDL operation into multiple small batches.

It is forbidden to grant super permission to the account used by the program

  • When the maximum number of connections is reached, a user connection with super permission is also run

  • The super authority can only be reserved for the account that the DBA handles the problem.

  • For the program to connect to the database account, follow the principle of least privilege

  • The database account used by the program can only be used under one DB, and cross-databases are not allowed

  • In principle, the account used by the program is not allowed to have drop permissions

Note: The program is not a young patent, but it is young.