A system is highly available if it can still be accessed despite the failure of one or more components and can be used without immediate intervention. To realize this, the following conditions must be observed:
Single Point of Failure
One speaks of a Single Point of Failure, when a failure or malfunction of a single component renders the entire system inoperable. A possible Single Point of Failure must always be avoided.
Identical or comparable systems exist at least twice. In technical terms, this ensures that the provision of these services remains available even in the event of a fault in one area.
SOLUTIONS FOR INDIVIDUAL SERVICES
To ensure the availability of your web pages even if a web server fails, more than one web server must be available. Apache or NGINX as web server can make this possible behind a load balancer, which checks the availability and function of each web server.
A database cluster is an association of several database servers, which all maintain the same data. Due to the redundancy between all servers in the database cluster, it is possible that if one database server fails, the other participants take over its tasks. Suitable database software includes MariaDB-Galera and PerconaXtraDB. The web servers can be connected to this cluster using ProxySQL or HAproxy.
A file system cluster is a file system that allows access to shared directories in a server group. By keeping files on several servers, they remain available even if one server in the cluster fails. A file system cluster is accessed by all servers that need files from these file shares. File system clusters include, for example, GlusterFS or CephFS.
With a fast website you can only win! Your visitors will feel comfortable and search engines such as Google will thank you for this with a better search engine ranking. There are a few points to consider to tickle maximum performance from the available hardware.
For example, large volumes of requests are distributed to several systems working in parallel. Thus, requests can be processed simultaneously and the utilization of the individual components behind the load balancer can be controlled. These include, for example, the HA proxy, Google Seesaw or F5 load balancer.
Profilers are programming tools that help the developer to analyze and compare running programs. In this way, performance and resource requirements can be checked and problem areas detected. Profiling should always be the first step towards high performance. If this analysis and optimization is missing, it can only be accomplished with hardware. In the long run this is very expensive. One such profiling tool is Blackfire, for example.
With caching you can cache recurring events of tasks and requests. For future requests, the data already available can be accessed. Retrieved or already calculated data can thus be delivered and processed much faster. Such caches are for example MemCache,QueryCache, Redis or Varnish. For TYPO3 the extension staticfilecache is recommended. Caching is a basic requirement for speed!
One of the most important aspects of server maintenance is security. One wants to protect oneself around the clock against hacks, unrecognized cryptomers or loss of personal data. But you also need certain procedures and tools.
A big problem are so-called "snowflake servers". The term describes the situation that all servers are actually the same, but on closer inspection differences become apparent. Such a state can lead to problems in maintenance or further development. The solution for this is provisioning. Here the configuration of a server is stored as code in text form. From a program such as Ansible, Puppet, Chef or Saltstack the configuration is then rolled out to the desired servers. This procedure ensures that services and tools are configured the same on all servers. This code also provides complete technical documentation of the servers. A new installation, e.g. on new hardware, takes only a fraction of the time otherwise required with the help of existing provisioning.
In order to be able to look after servers with foresight, monitoring with personalized tests is a good solution. In the best case, meaningful test scenarios can be used to identify potential problems in advance. Furthermore, it is our claim to identify problems before our customers! Tools like Nagios or Icinga, as well as Zabbix are suitable for this.
A backup is defined as a backup copy that holds all required data in case of a system failure or data loss. This data can be copied back or restored at any time. A so-called restore of data from an older point in time is therefore also possible. There are different backup methods like a full backup with all files or an incremental/differential backup which contains only partial backups. It is also very important to test a backup, so that you will not be surprised in case of an emergency. An untested backup is in the worst case simply data garbage.
Disaster Recovery means a recovery in case of a disaster. A disaster is for example a complete failure of hardware. A critical point here is the recovery of no longer usable infrastructure or hardware in the shortest possible time. The REAR (Relax and Recover) tool is used for this purpose.
Do you have any further questions on this topic or a concrete need for an infrastructure analysis? Martin Huber will be happy to assist you personally.