Log file analysis is underrated.
I believe learning how to utilise log files should be a part of every SEO’s routine website health checks.
When I was first learning how to carry out log file analysis, I struggled to understand where to begin. Every other blog was telling me how to set up the software and import the files, but not what to look for and why it’s important.
How do you decide what to look for and what actions to take? What will make the most impact?
This article is written as though you’ve already imported some log files into your tool of choice (in my case – Screaming Frog Log File Analyser). You’ve got your data, and you need to know what to do with it.
What is log file analysis?
Log file analysis enables you to observe the precise interactions between your website and Googlebot (as well as other web crawlers such as BingBot). By examining log files, you gain valuable insights that can shape your SEO strategy and address issues related to the crawling and indexing of your web pages.
Why is log file analysis important?
Log file analysis allows you to see exactly what all bots are doing, across all of your content. First-party tools such as Google Search Console or Bing Webmaster Tools provide limited insight into which content is being discovered and crawled by their bots, offering only a tiny fraction of the whole story.
Since crawl budget is limited, it’s important that search engine spiders spend as little time as possible on URLs that have no organic value. We want spiders to concentrate on the pages that you want indexed, crawled and served to potential customers.
It also means changes to your site such as new product additions or timely blog posts stand a better chance of being picked up and indexed more quickly than those of your competitors.
Keep this in mind when doing the log file analysis. The goal is to make it as easy as possible for bots to access the most important pages on your site.
Log file analysis shouldn’t be a one-off task. It’s especially helpful during site migrations and uploading new content to ensure changes are being picked up quickly. Making it a part of a regular SEO strategy will ensure you know exactly which bots are accessing your site, and when.
Cross-checking with a crawler such as Screaming Frog SEO Spider
You can use a crawler tool such as Screaming Frog SEO Spider to build a clean list of the URLs you want indexing. Feeding them into the project file alongside the log files will help you take a more contextual approach with what you want bots to see.
Importantly, it helps you to see:
- Which URLs are being crawled that should be crawled
- Which URLs are not being crawled that should be
- Which URLs are being crawled that definitely should not be
So, what shouldn’t be crawled?
This really varies from site to site. Here are a few examples:
Indexable search function and catalogue filters
Search functions should generally be blocked in robots.txt. There’s rarely any search volume for the random values users enter, and you don’t want those search results pages indexed.
Catalogue filters, like ?colour=red&size=small
, need evaluating based on search volume and product availability.
If keyword research shows demand for “denim dog hats”, that parameter might be worth keeping open to crawlers. But “small red dog hats” with no search volume? Block it.
Broken pages
Log file analysis can show which 404s are being hit most often. That tells you which missing pages matter — and where to prioritise 301 redirects.
Also important to note…
Verifying bots
Always verify that the user agents claiming to be Googlebot are actually Google. In Screaming Frog Log File Analyser: Project - Verify Bots.
This helps filter out fake user agents and keeps your analysis accurate.
Utilising Search Console Data
Cross-check log file data with Google Search Console. You can even integrate GSC with Screaming Frog to pull in click and impression data to help you prioritise what to allow or block.
Recovering corrupted log file analysis projects
Big log files can lead to corrupted project files. If this happens, re-import your saved logs into a fresh project. Backing up log files on an encrypted external hard drive is a good habit.
Interesting issues I’ve encountered
Googlebot 403
A JS site migration caused Googlebot to be served a 403. The site looked fine to users, but it was invisible to Google. The issue was a misconfigured full-page cache. Once fixed, we saw 200 OK responses again — and pages began indexing properly.
Proxy server masking IPs
On another site, bot traffic appeared unverified. The WAF was replacing Googlebot’s IP with an internal IP. This meant the log data couldn’t prove what was real. We fixed it by whitelisting Google IPs and bypassing the WAF for bots.
Overall
Log file analysis adds a critical layer to any technical SEO audit. While log files can be tricky to obtain and manage, the insight they provide into bot behaviour, crawl issues, and indexing status is invaluable — especially for large or complex websites.