Part 1 | “Part 2”/blog/2011/oct/12/load-testing-jmeter-part-2-headless-testing-and-je/ | Part 3
A while ago, I wrote a couple of blog entries about load testing with JMeter. I promised a third entry covering how to use JMeter to replay Apache logs and roughly recreate production load, but I never followed through with it. Today, I intend to rectify this grievous error.
Parsing your Apache Logs
There is more than one way to do this, but my preferred method is to use a simple Python script to do some filtering of the Apache log file you want to use and to output the desired urls as a tidy CSV file. I am using the ‘apachelog’ module for this (also available as a gist):
This script takes the name of the logfile to parse as the one required argument, and provides a few options as well. The script will parse each line of your Apache log file, and check to see if it meets a few criteria before including it in your CSV file. First, it checks to see if the method was GET and the status code was 200. Then, it checks the regular expressions inMEDIA_RE
and SPECIAL_RE
, and if it matches either of them the record is discarded. This is so that you can filter out media requests or special case urls such as the Django admin. If you specified a grep filter, it will only include lines where that plain text value is present. If your format differs from the default, make sure to pass the format along with the -f
option, or modify the script to make the change permanent.
The result should be a urls.csv file with a url and a user agent on each line. This file will be used to recreate the requests in JMeter.
Replaying in JMeter
Setting this up in JMeter is rather easy. I use a separate test plan for replaying logs:
Within the plan, I’ve got a Thread Group created called “Replay Log”:
In that Thread Group, I have a CSV reader that loads the urls and populates two variables – url
and user_agent
:
I use a Header Manager to provide the User Agent:
Finally, I use the url
variable as the path in the HTTP Request:
With all of that configured, I can now replay the log and take a a measurement of some real-world urls under load!
Tweaking the Parser
There are a couple of ways you can customize the parser script to your liking. I’m only allowing requests with a status code of 200 through. You can customize this on line 86 of the script and allow 404s or any other code you’d like to include in your urls.
If you want to add more media types, you can extend the MEDIA_RE
variable at the top of the script. You can also exclude special urls by adding to the SPECIAL_RE
variable. In both cases, just use a pipeline (|) to separate your entries.
You can add more data to the CSV file so that you can use it in JMeter by customizing the writer
call on line 94 of the script, adding in more details that apachelog recognizes from each log line. Make sure to modify your CSV Data Set module in JMeter to match this new CSV format.
I apologize for the delay in getting this post out, but I hope it’s helpful to you in your load testing endeavors!