Contents...
Sometimes you noticed your server loading slowly or down and sometimes you also noticed that your servers resources usage are more at particular time than expect. It may be due to high/heavy usage on your server. During web page loading slowly may be at that time bots and unwanted scripts are hitting and executing on your web server. For dynamic websites many plugins and modules provide additional functionality but it may be impact on your web server performance. Remove unwanted plugins and modules from your web server for better performance . Plugins and modules are basically used for making websites more efficient like Caching plugins etc.,
In this article I am going to explain how you can find the causes of heavy usage on your web server.
CHECKING WEB SERVER ACCESS LOG LOCATION
You can check your web server access log to confirm exactly what is hitting your websites using access log file.Find out the web server access log location on your web server. In my case, my web server access log is located on “/var/log/httpd/vhost/” directory.
# ls -l /var/log/httpd/vhost/ total 240 -rw-r--r-- 1 root root 236290 Dec 5 10:53 access.log -rw-r--r-- 1 root root 2523 Dec 5 04:16 error.log
EXAMINE ACCESS LOG
Now we can examine the access log to find out the culprits IPs and bots which is hitting your server most.Follow the below command to find out the which IPs are hitting your server most. Login to your server and navigate the access log fine.
#cd /var/log/httpd/vhost/ #ls –l total 240 -rw-r--r-- 1 root root 236290 Dec 5 10:53 access.log -rw-r--r-- 1 root root 2523 Dec 5 04:16 error.log
Top 10 IPs which are hitting your web server
#awk '{print $1}' access.log | sort | uniq -c | sort -nr | head 290 66.249.66.166 97 223.176.160.29 59 93.77.134.151 52 94.25.134.52 44 94.247.174.83 44 85.93.93.124 44 76.164.194.74 44 174.34.156.130 44 109.123.101.103
Use host command to check the hosting company from which specific IP hitting.
If host command not found on your system . Install it using below command.
# yum install bind-utils ======================================================================================================================================================================== Package Arch Version Repository Size ======================================================================================================================================================================== Installing: bind-utils x86_64 32:9.8.2-0.47.rc1.el6_8.3 updates 187 k Updating for dependencies: bind x86_64 32:9.8.2-0.47.rc1.el6_8.3 updates 4.0 M bind-libs x86_64 32:9.8.2-0.47.rc1.el6_8.3 updates 890 k Transaction Summary ======================================================================================================================================================================== Install 1 Package(s)
#host 66.249.66.166 166.66.249.66.in-addr.arpa domain name pointer crawl-66-249-66-166.googlebot.com.
Above you can see “66.249.66.166” IP belongs to Google. You can block these types of IPs if it is hitting your server unnecessary.
BLOCK IPs USING .HTACCESS FILE
You can block culprit IPs using .htaccess file in Apache. Navigate your web server Document Root and create a .htaccess file. In My case document root is “/var/www/html/vhost/”
#cd /var/www/html/vhost/ #vim .htaccess order allow,deny deny from 66.249.66.166 allow from all
Save and exit.
REQUESTED TOP TEN FILES AND DIRECTORY ON WEB SERVER
We can check files and folders/directory on web server being called the most.
Listing Top 10 file and directories which is being called most on the web server
#awk '{print $7}' access.log | sort | uniq -c | sort -nk1 | tail -n10 8 /wp-content/themes/spacious/genericons/genericons.css?ver=3.3.1 8 /wp-content/uploads/2016/11/looklinux-bg.jpg 8 /wp-includes/js/jquery/jquery.js?ver=1.12.4 9 /wp-admin/ 9 /wp-content/plugins/wp-pagenavi/pagenavi-css.css?ver=2.70 13 /questions/ask/ 14 /wp-admin/nav-menus.php 126 /wp-admin/admin-ajax.php 220 /index.php 304 /
You can see above “/” directory is called 304 times.
EXAMINE SPIDERS, BOTS AND CRAWLERS ON WEB SERVER
You can examine your web server access log for bots,spiders and other crawlers which are hitting your server most and consuming server resource (Memory and CPU). These crawlers can slow down your websites.For batter web server performance you will need to create “robots.txt” file in your web server root directory. It allow to search engines what content should be indexes and what content shout not be indexes.
FINDING TOP USER-AGENT WHICH IS HITTING WEB SERVER
Follow the below command to find out all user-agent which is hitting your web server most.
#cat example.com_access.log |awk -F'"' '/GET/ {print $6}' | cut -d' ' -f1 | sort | uniq -c | sort -rn 687 Mozilla/5.0 476 Pingdom.com_bot_version_1.4_(http://www.pingdom.com/) 26 facebookexternalhit/1.1 6 MetaInspector/4.7.2 3 Baiduspider-image+(+http://www.baidu.com/search/spider.htm)
Above you can see all robots which are hitting your web server.
STOP ROBOTS TO INDEX WEB SERVER CONTENT
Most of the time robots like Yahoo, Google,msnbot, and facebookexternalhit etc., crowel your sites and cause load spike and consumed server resources. For batter web server performance you will need to block these robots.
BLOCKING GOOGLEBOTS :
Above we found “66.249.66.166” IP belongs to google using host command from access log.
#host 66.249.66.166 166.66.249.66.in-addr.arpa domain name pointer crawl-66-249-66-166.googlebot.com.
Now I am going to block Googlebot using “robots.txt” file.
First navigate your web-server documents root directory and create “robots.txt “ file.
In my case my web server document root directory is “/var/www/html/vhost/”
#cd /var/www/html/vhost/ #vim robots.txt #Block Googlebot User-agent: Googlebot Disallow: /
Fields Explanations:
#Block Googlebot :– This is only comment
User-agent: – Bot Name
Disallow: – Path to stop crowle, Here we block for indexing for everything on the server.
BLOCKING YAHOO :
You can also block crawler for time base in robots.txt. You cat set limits their fetching activity.To delay crawler you can tell yahoo not to fetch page more than once 20 seconds. Add below line in your robots.txt file for delay crawler.
#cd /var/www/html/vhost/ #vim robots.txt #Delay yahoo crawler for 20 seconds User-agent: Slurp Crawl-delay: 20
Fields Explanations:
#Delay Yahoo Crawler for 20 seconds :- This is only comment
User-agent: – Slurp is name of Yahoo user agent.
Crawl-delay: User agent will wait 20 seconds between each fetch and request.
CONFIGURE FETCHING SLOW FOR GOOD BOTS
Some time you will need good bots to crawl your site for traffic purpose. Configure your robots.txt file like below.
#cd /var/www/html/vhost/ #vim robots.txt # Slow crawling 3400 seconds for all bots User-agent: * Crawl-delay: 3400
Fields Explanations:
# Slow crawling 3400 seconds for all bots :- This is only comment
User-agent: – “ * ” for all User-agents.
Crawl-delay: – User agent will wait 3400 seconds between each fetch and request.
DISALLOW ALL BOTS TO CRAWL YOU WEBSITES
Add below line into your robots.txt file to disallow all bots.
#cd /var/www/html/vhost/ #vim robots.txt # Disallow all bots User-agent: * Disallow: /
DISALLOW ONLY SPECIFIC FOLDER TO CRAWL YOUR WEBSITES
Follow below line and add into your robots.txt file to disallow only specific folder/directory.
#cd /var/www/html/vhost/ #vim robots.txt # Disallow specific directory User-agent: * Disallow: /Your_Directory_Name/
Fields Explanations:
# Disallow specific directory :– This is only comment
User-agent: – “ * ” for all User-agents.
Disallow: /Your_Directory_Name/ :– Disallow the only mentions directory.
ALLOW EVERYTHING TO CRAWL YOUR WEBSITES
If you want to allow everything to crawl add below line into your robots.txt file.
#cd /var/www/html/vhost/ #vim robots.txt # Allow Everything for crawling User-agent: * Disallow:
If you are getting 404 not found request in your web log create robots.txt file with above line.
I hope this article will be helpful to find causes of heavy usage on web server. If you have any queries and problem please comment in comment section.
Thanks:)
If you find this tutorial helpful please share with your friends to keep it alive. For more helpful topic browse my website www.looklinux.com. To become an author at LookLinux Submit Article. Stay connected to Facebook.
Leave a Comment