Some practices of MySQL single machine database optimization

Database optimization has many things to say, and can be divided into two stages according to the supported data volume: single-machine database and sharding and partitioning, the former can generally support500W or1000W or, generally large enterprises tend to ask about single-machine databases during interviews, step by step to sharding and partitioning, and many database optimization issues will be interspersed in the middle. This article attempts to describe some practical experiences in single-machine database optimization, the database is based on MySQL, and welcome corrections if there are any unreasonable places.

1、Table Structure Optimization

When starting an application, the design of the database table structure often affects the performance of the application later on, especially the performance after the number of users increases. Therefore, table structure optimization is an important step.

1.1、Character Set

Generally speaking, try to choose UTF-8Although GBK is better than UTF when storing in the middle-8The storage space used is less, but UTF-8Compatible with various languages, in fact, we do not have to sacrifice scalability for this little storage space. In fact, if you need to convert from GBK to UTF-8The cost incurred is very high, and data migration is required, while the storage space can be completely solved by expanding the hard disk with money.

1.2、Primary Key

When using MySQL's InnoDB, the underlying storage model of InnoDB is B+The tree uses the primary key as a clustered index, uses the inserted data as leaf nodes, and can quickly find leaf nodes through the primary key, thus quickly obtaining records. Therefore, when designing tables, it is necessary to add a primary key, and it is best to be auto-incrementing. Because an auto-incrementing primary key allows the inserted data to be inserted into the underlying B in order of the primary key.+In the leaf nodes of the tree, since they are ordered, this insertion does not require moving the existing other data, so the insertion efficiency is very high. If the primary key is not auto-incrementing, then each time the value of the primary key is approximately random, and it may be necessary to move a large amount of data to ensure B+The characteristics of the tree have increased unnecessary costs.

1.3、字段

1.3.1、建了索引的字段必须加上not null约束，并且设置default值

1.3.2、不建议使用float、double来存小数，防止精度损失，建议使用decimal

1.3.3、不建议使用Text/blob来保存大量数据，因为对大文本的读写会造成比较大的I/O开销，同时占用mysql的缓存，高并发下会极大的降低数据库的吞吐量，建议将大文本数据保存在专门的文件存储系统中，mysql中只保存这个文件的访问地址，比如博客文章可以保存在文件中，mysql中只保存文件的相对地址。

1.3.4、varchar类型长度建议不要超过8K。

1.3.5、时间类型建议使用Datetime，不要使用timestamp，虽然Datetime占用8个字节，而timestamp只占用4个字节，但是后者要保证非空，而且后者是对时区敏感的。

1.3.6、建议表中增加gmt_create和gmt_modified两个字段，用来记录数据创建的修改时间。这两个字段建立的原因是方便查问题。

1.4、索引创建

1.4.1、这个阶段由于对业务并不了解，所以尽量不要盲目加索引，只为一些一定会用到索引的字段加普通索引。

1.4.2、创建innodb单列索引的长度不要超过767bytes，如果超过会用前255bytes作为前缀索引

1.4.3、创建innodb组合索引的各列索引长度不要超过767bytes，一共加起来不要超过3072bytes

2、SQL优化

一般来说sql就那么几种：基本的增删改查，分页查询，范围查询，模糊搜索，多表连接

2.1、基本查询

一般查询需要走索引，如果没有索引建议修改查询，把有索引的那个字段加上，如果由于业务场景没法使用这个字段，那么需要看这个查询调用量大不大，如果大，比如每天调用10W+，这就需要新增索引，如果不大，比如每天调用100+，则可以考虑保持原样。另外，select * möglichst selten verwenden，verwenden Sie was für ein Feld im SQL-Ausdruck, was Sie benötigen, und suchen Sie keine unnötigen Felder auf, das verschwendet I/O und Speicherplatz.

2.2、高效分页

limit m,n其实质就是先执行limit m+n，然后从第m行取n行，这样当limit翻页越往后翻m越大，性能越低。比如

select * from A limit 100000,10，这种sql语句的性能是很差的，建议改成下面的版本:

selec id,name,age from A where id >=(select id from A limit 100000,1) limit 10

2.3、Bereichsabfragen

Bereichsabfragen umfassen between, größer, kleiner und in. Die Bedingungen der in-Abfrage in MySQL haben eine quantitative Begrenzung, bei kleinen Mengen können sie einen Indexsuche durchführen, bei großen Mengen wird es ein vollständiger Tabellen scan. Und between, größer, kleiner und andere Abfragen führen nicht zum Index, daher sollten sie am besten nach den Indexsuchbedingungen gestellt werden.

2.4、vage Suche like

Der Gebrauch von like %name% führt nicht zum Index, was einem vollständigen Tabellen scan entspricht. Bei kleinen Datenmengen gibt es kein großes Problem, aber bei großen Datenmengen sinkt die Leistung stark. Es wird empfohlen, bei großen Datenmengen Suchmaschinen zu verwenden, um diese Art der vagen Suche zu ersetzen, und wenn dies nicht möglich ist, sollte eine bedingung hinzugefügt werden, die den Index verwendet, bevor die vage Suche durchgeführt wird.

2.5、mehrere Tabellenverbindungen

Subqueries und Join können Daten zwischen mehreren Tabellen abrufen, aber Subqueries haben eine schlechtere Leistung, daher wird empfohlen, Subqueries in Join zu ändern. Für MySQL-Join verwendet es das Nested Loop Join-Algorithmus, das bedeutet, durch die Ergebnisse der Abfrage der ersten Tabelle in der zweiten Tabelle zu suchen, zum Beispiel, wenn das Ergebnissatz der ersten Tabelle100 Datensätze, der nächste Table hat10Daten, dann muss in100*10Die Ergebnisse werden aus dem Datenmengensatz W gefiltert und das endgültige Ergebnissatz erhalten. Daher sollte versucht werden, kleine Ergebnissätze mit großen Tabellen zu verbinden und gleichzeitig Indizes auf den Verbindungsfeldern zu erstellen. Wenn dies nicht möglich ist, muss der join buffer size ausreichend groß eingestellt werden. Wenn alle diese Techniken nicht helfen, das durch die Verbindung verursachte Leistungsnachlassen zu beheben, sollte einfach auf die Verbindung verzichtet werden, und die einmalige Verbindungssuche in zwei einfache Suchen aufgeteilt werden. Außerdem sollten mehrere Tabellenverbindungen nicht mehr als drei Tabellen betragen, da dies im Allgemeinen eine sehr schlechte Leistung hat. Es wird empfohlen, die SQL zu teilen.

3、Datenbankverbindungs-Pool-Optimierung

Ein Datenbankverbindungs-Pool ist im Wesentlichen ein Cache, eine Methode zur Bewältigung hoher Konkurrenz. Die Optimierung des Datenbankverbindungs-Pools besteht hauptsächlich darin, die Parameter zu optimieren. Wir verwenden normalerweise den DBCP-Verbindungs-Pool, seine spezifischen Parameter sind wie folgt:

3.1　 initialSize

Die Anzahl der initialen Verbindungen, hier bedeutet der Begriff 'initial' den Zeitpunkt, wenn getConnection zum ersten Mal aufgerufen wird, nicht den Zeitpunkt, wenn die Anwendung gestartet wird. Der Standardwert kann auf den historischen Durchschnitt der Konkurrenzmenge eingestellt werden

3.2、minIdle

Die minimale Anzahl der beibehaltenen freien Verbindungen. DBCP startet im Hintergrund einen Thread, der freie Verbindungen sammelt. Wenn dieser Thread freie Verbindungen sammelt, behält er die Anzahl der Verbindungen, die minIdle beträgt. Normalerweise wird dies5，wenn die Konkurrenz wirklich sehr gering ist, kann sie auf1.

3.3、maxIdle

Die maximale Anzahl der beibehaltenen freien Verbindungen, entsprechend der Geschäftskonkurrenzspitze eingestellt. Zum Beispiel beträgt die Konkurrenzspitze20, dann werden diese Verbindungen nicht sofort wieder freigegeben, wenn der Spitzenverbrauch vorbei ist. Wenn nach einer kurzen Zeit ein weiterer Spitzenverbrauch auftritt, kann das Verbindungs-Pool diese freien Verbindungen wiederverwenden, ohne dass häufig neue Verbindungen erstellt und geschlossen werden müssen.

3.4maxActive

Maximum active connections, set according to the acceptable maximum concurrency. For example, the acceptable maximum concurrency for a single machine is100, then this maxActive should be set to100 after, can only be served for100 requests can be served at the same time, extra requests will be discarded after the maximum waiting time. This value must be set, which can prevent malicious concurrent attacks and protect the database.

3.5maxWait

The maximum waiting time to obtain a connection, it is recommended to set it shorter, such as3s, so that the request can fail quickly, because when a request is waiting to obtain a connection, the thread cannot be released, and the thread concurrency of a single machine is limited. If this time is set too long, such as the time suggested on the internet60s, then this thread in this60s cannot be released. As soon as this type of request becomes numerous, the available threads of the application are reduced, and the service becomes unavailable.

3.6minEvictableIdleTimeMillis

The time that a connection remains idle and is not recycled, default30 minutes.

3.7validationQuery

Used to check whether the connection is valid, generally a simple SQL statement, it is recommended to set

3.8testOnBorrow

Perform a check on the connection when applying for it, it is not recommended to enable it as it severely affects performance

3.9testOnReturn

Perform a check on the connection when returning it, it is not recommended to enable it as it severely affects performance

3.10testWhileIdle

After enabling it, the background thread that cleans up connections will periodically perform validateObject on idle connections. If a connection fails, it will be cleared, which does not affect performance and is recommended to be enabled.

3.11numTestsPerEvictionRun

Represents the number of checks performed for each link, it is recommended to set it to be the same as maxActive, so that all links can be effectively checked each time.

3.12Preheat Connection Pool

It is recommended to preheat the connection pool when starting the application, performing simple SQL queries before providing external access to fill the necessary number of connections in the connection pool.

4Index Optimization

Once the data volume reaches a certain level, SQL optimization can no longer improve performance. At this point, it is necessary to resort to the big move: indexing. Indexes have three levels, generally speaking, mastering these three levels is sufficient. In addition, the selectivity of the fields on which indexes are established should be considered.

4.1Primary Index

Establish an index on the conditions following 'where', a single column can be established as a regular index, while multiple columns require a composite index. Pay attention to the principle of the leftmost prefix for composite indexes.

4.2Secondary Index

If a field used in 'order by' or 'group by' is present, consider creating an index on this field. This is because, due to the natural order of indexes, it can avoid the sorting brought by 'order by' and 'group by', thereby improving performance.

4.3、三级索引

如果上面两招还不行，那么就把所查询的字段也加上索引，这时候就形成了所谓的索引覆盖，这样做可以减少一次I/O操作，因为mysql在查询数据的时候，是先查主键索引，然后根据主键索引去查普通索引，然后根据普通索引去查相对应的记录。如果我们所需要的记录在普通索引里都有，那就不需要第三步了。当然，这种建索引的方式比较极端，不适合一般场景。

4.4、索引的选择性

在建立索引的时候，尽量在选择性高的字段上建立。什么是选择性高呢？所谓选择性高就是通过这个字段查出来的数据量少，比如按照名字查一个人的信息，查出来的数据量一般会很少，而按照性别查则可能会把数据库一半的数据都查出来，所以，名字是一个选择性高的字段，而性别是个选择性低的字段。

5、历史数据归档

当数据量到了一年增加500W条的时候，索引也无能为力，这时候一般的思路都是考虑分库分表。如果业务没有爆发式增长，但是数据的确在缓慢增加，则可以不考虑分库分表这种复杂的技术手段，而是进行历史数据归档。我们针对生命周期已经完结的历史数据，比如6个月之前的数据，进行归档。我们可以使用quartz的调度任务在凌晨定时将6个月之前的数据查出来，然后存入远程的hbase服务器。当然，我们也需要提供历史数据的查询接口，以备不时之需。

以上就是对mysql单机数据库的优化资料整理，后续将继续补充相关资料，感谢大家对本站的支持！

声明：本文内容来源于互联网，版权归原作者所有。内容由互联网用户自发贡献自行上传，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任。如果您发现有涉嫌版权的内容，欢迎发送邮件至：notice#oldtoolbag.com（在发邮件时，请将#更换为@进行举报，并提供相关证据。一经查实，本站将立即删除涉嫌侵权内容。）

Basic Tutorial