Part 1 | “Part 2”/blog/2011/oct/12/load-testing-jmeter-part-2-headless-testing-and-je/ | Part 3
A while ago, I wrote a couple of blog entries about load testing with JMeter. I promised a third entry covering how to use JMeter to replay Apache logs and roughly recreate production load, but I never followed through with it. Today, I intend to rectify this grievous error.
Parsing your Apache Logs
There is more than one way to do this, but my preferred method is to use a simple Python script to do some filtering of the Apache log file you want to use and to output the desired urls as a tidy CSV file. I am using the ‘apachelog’ module for this (also available as a gist):
#!/usr/bin/env python
"""
Requires apachelog. `pip install apachelog`
"""
from __future__ import with_statement
import apachelog
import csv
import re
import sys
from optparse import OptionParser
<span class="n"><span class="caps">STATUS</span>_CODE</span> <span class="o">=</span> <span class="s1">'%>s'</span>
<span class="n"><span class="caps">REQUEST</span></span> <span class="o">=</span> <span class="s1">'</span><span class="si">%r</span><span class="s1">'</span>
<span class="n"><span class="caps">USER</span>_AGENT</span> <span class="o">=</span> <span class="s1">'%{User-Agent}i'</span>
<span class="n"><span class="caps">MEDIA</span>_RE</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="sa">r</span><span class="s1">'\.png|\.jpg|\.jpeg|\.gif|\.tif|\.tiff|\.bmp|\.js|\.css|\.ico|\.swf|\.xml'</span><span class="p">)</span>
<span class="n"><span class="caps">SPECIAL</span>_RE</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="sa">r</span><span class="s1">'xd_receiver|\.htj|\.htc|/admin'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
<span class="n">usage</span> <span class="o">=</span> <span class="s2">"usage: %prog [options] <span class="caps">LOGFILE</span>"</span>
<span class="n">parser</span> <span class="o">=</span> <span class="n">OptionParser</span><span class="p">(</span><span class="n">usage</span><span class="o">=</span><span class="n">usage</span><span class="p">)</span>
<span class="n">parser</span><span class="o">.</span><span class="n">add_option</span><span class="p">(</span>
<span class="s2">"-o"</span><span class="p">,</span> <span class="s2">"—outfile"</span><span class="p">,</span>
<span class="n">dest</span><span class="o">=</span><span class="s2">"outfile"</span><span class="p">,</span>
<span class="n">action</span><span class="o">=</span><span class="s2">"store"</span><span class="p">,</span>
<span class="n">default</span><span class="o">=</span><span class="s2">"urls.csv"</span><span class="p">,</span>
<span class="n">help</span><span class="o">=</span><span class="s2">"The output file to write urls to"</span><span class="p">,</span>
<span class="n">metavar</span><span class="o">=</span><span class="s2">"<span class="caps">OUTFILE</span>"</span>
<span class="p">)</span>
<span class="n">parser</span><span class="o">.</span><span class="n">add_option</span><span class="p">(</span>
<span class="s2">"-f"</span><span class="p">,</span> <span class="s2">"—format"</span><span class="p">,</span>
<span class="n">dest</span><span class="o">=</span><span class="s2">"logformat"</span><span class="p">,</span>
<span class="n">action</span><span class="o">=</span><span class="s2">"store"</span><span class="p">,</span>
<span class="n">default</span><span class="o">=</span><span class="sa">r</span><span class="s1">'<span>h %l </span><span class="si">%u</span><span class="s1"> %t \"</span><span class="si">%r</span><span class="s1">\" %>s %b \"</span></span><span class="si">{Referer}</span><span class="s1">i\" \"%{User-Agent}i\"'</span><span class="p">,</span>
<span class="n">help</span><span class="o">=</span><span class="s2">"The Apache log format, copied and pasted from the Apache conf"</span><span class="p">,</span>
<span class="n">metavar</span><span class="o">=</span><span class="s2">"<span class="caps">FORMAT</span>"</span>
<span class="p">)</span>
<span class="n">parser</span><span class="o">.</span><span class="n">add_option</span><span class="p">(</span>
<span class="s2">"-g"</span><span class="p">,</span> <span class="s2">"—grep"</span><span class="p">,</span>
<span class="n">dest</span><span class="o">=</span><span class="s2">"grep"</span><span class="p">,</span>
<span class="n">action</span><span class="o">=</span><span class="s2">"store"</span><span class="p">,</span>
<span class="n">help</span><span class="o">=</span><span class="s2">"Simple, plain text filtering of the log lines. No regexes. This "</span>
<span class="s2">"is useful for things like date filtering – DD/Mmm/YYYY."</span><span class="p">,</span>
<span class="n">metavar</span><span class="o">=</span><span class="s2">"<span class="caps">TEXT</span>"</span>
<span class="p">)</span>
<span class="n">options</span><span class="p">,</span> <span class="n">args</span> <span class="o">=</span> <span class="n">parser</span><span class="o">.</span><span class="n">parse_args</span><span class="p">()</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">args</span><span class="p">:</span>
<span class="n">sys</span><span class="o">.</span><span class="n">stderr</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s1">'Please provide an Apache log to read from.</span><span class="se">\n</span><span class="s1">'</span><span class="p">)</span>
<span class="n">sys</span><span class="o">.</span><span class="n">exit</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="n">create_urls</span><span class="p">(</span><span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">options</span><span class="o">.</span><span class="n">outfile</span><span class="p">,</span> <span class="n">options</span><span class="o">.</span><span class="n">logformat</span><span class="p">,</span> <span class="n">options</span><span class="o">.</span><span class="n">grep</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">create_urls</span><span class="p">(</span><span class="n">logfile</span><span class="p">,</span> <span class="n">outfile</span><span class="p">,</span> <span class="n">logformat</span><span class="p">,</span> <span class="n">grep</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="n">parser</span> <span class="o">=</span> <span class="n">apachelog</span><span class="o">.</span><span class="n">parser</span><span class="p">(</span><span class="n">logformat</span><span class="p">)</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">logfile</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">,</span> <span class="nb">open</span><span class="p">(</span><span class="n">outfile</span><span class="p">,</span> <span class="s1">'wb'</span><span class="p">)</span> <span class="k">as</span> <span class="n">o</span><span class="p">:</span>
<span class="n">writer</span> <span class="o">=</span> <span class="n">csv</span><span class="o">.</span><span class="n">writer</span><span class="p">(</span><span class="n">o</span><span class="p">)</span>
<span class="c1"># Status spinner</span>
<span class="n">spinner</span> <span class="o">=</span> <span class="s2">"|/-</span><span class="se">\\</span><span class="s2">"</span>
<span class="n">pos</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">line</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">f</span><span class="p">):</span>
<span class="c1"># Spin the spinner</span>
<span class="k">if</span> <span class="n">i</span> <span class="o">%</span> <span class="mi">10000</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="n">sys</span><span class="o">.</span><span class="n">stdout</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s2">"</span><span class="se">\r</span><span class="s2">"</span> <span class="o">+</span> <span class="n">spinner</span><span class="p">[</span><span class="n">pos</span><span class="p">])</span>
<span class="n">sys</span><span class="o">.</span><span class="n">stdout</span><span class="o">.</span><span class="n">flush</span><span class="p">()</span>
<span class="n">pos</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="n">pos</span> <span class="o">%=</span> <span class="nb">len</span><span class="p">(</span><span class="n">spinner</span><span class="p">)</span>
<span class="c1"># If a filter was specified, filter by it</span>
<span class="k">if</span> <span class="n">grep</span> <span class="ow">and</span> <span class="ow">not</span> <span class="n">grep</span> <span class="ow">in</span> <span class="n">line</span><span class="p">:</span>
<span class="k">continue</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">parser</span><span class="o">.</span><span class="n">parse</span><span class="p">(</span><span class="n">line</span><span class="p">)</span>
<span class="k">except</span> <span class="n">apachelog</span><span class="o">.</span><span class="n">ApacheLogParserError</span><span class="p">:</span>
<span class="k">continue</span>
<span class="n">method</span><span class="p">,</span> <span class="n">url</span><span class="p">,</span> <span class="n">protocol</span> <span class="o">=</span> <span class="n">data</span><span class="p">[</span><span class="n"><span class="caps">REQUEST</span></span><span class="p">]</span><span class="o">.</span><span class="n">split</span><span class="p">()</span>
<span class="c1"># Check for <span class="caps">GET</span> requests with a status of 200</span>
<span class="k">if</span> <span class="n">method</span> <span class="o">!=</span> <span class="s1">'<span class="caps">GET</span>'</span> <span class="ow">or</span> <span class="n">data</span><span class="p">[</span><span class="n"><span class="caps">STATUS</span>_CODE</span><span class="p">]</span> <span class="o">!=</span> <span class="s1">'200'</span><span class="p">:</span>
<span class="k">continue</span>
<span class="c1"># Exclude media requests and special urls</span>
<span class="k">if</span> <span class="n"><span class="caps">MEDIA</span>_RE</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="n">url</span><span class="p">)</span> <span class="ow">or</span> <span class="n"><span class="caps">SPECIAL</span>_RE</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="n">url</span><span class="p">):</span>
<span class="k">continue</span>
<span class="c1"># This is a good record that we want to write</span>
<span class="n">writer</span><span class="o">.</span><span class="n">writerow</span><span class="p">([</span><span class="n">url</span><span class="p">,</span> <span class="n">data</span><span class="p">[</span><span class="n"><span class="caps">USER</span>_AGENT</span><span class="p">]])</span>
<span class="nb">print</span> <span class="s1">' done!'</span>
<span class="k">if</span> <span class="vm"><i>name</i></span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="n">main</span><span class="p">()</span>
</pre></div>
This script takes the name of the logfile to parse as the one required argument, and provides a few options as well.
<code>
Usage: createurls.py [options] <span class="caps">LOGFILE</span>
Options:
-h, —help
show this help message and exit
-o <span class="caps">OUTFILE</span>, —outfile=OUTFILE
The output file to write urls to
-f <span class="caps">FORMAT</span>, —format=FORMAT
The Apache log format, copied and pasted from the Apache conf
-g <span class="caps">TEXT</span>, —grep=TEXT
Simple, plain text filtering of the log lines. No regexes. This is useful for things like date filtering – DD/Mmm/YYYY.
</code>
The script will parse each line of your Apache log file, and check to see if it meets a few criteria before including it in your <span class="caps">CSV</span> file. First, it checks to see if the method was <span class="caps">GET</span> and the status code was 200. Then, it checks the regular expressions in MEDIA_RE
and SPECIAL_RE
, and if it matches either of them the record is discarded. This is so that you can filter out media requests or special case urls such as the Django admin. If you specified a grep filter, it will only include lines where that plain text value is present. If your format differs from the default, make sure to pass the format along with the -f
option, or modify the script to make the change permanent.
The result should be a urls.csv file with a url and a user agent on each line. This file will be used to recreate the requests in JMeter.
Replaying in JMeter
Setting this up in JMeter is rather easy. I use a separate test plan for replaying logs:

Within the plan, I’ve got a Thread Group created called “Replay Log”:

In that Thread Group, I have a CSV reader that loads the urls and populates two variables – url
and user_agent
:

I use a Header Manager to provide the User Agent:

Finally, I use the url
variable as the path in the HTTP Request:

With all of that configured, I can now replay the log and take a a measurement of some real-world urls under load!
Tweaking the Parser
There are a couple of ways you can customize the parser script to your liking. I’m only allowing requests with a status code of 200 through. You can customize this on line 86 of the script and allow 404s or any other code you’d like to include in your urls.
If you want to add more media types, you can extend the MEDIA_RE
variable at the top of the script. You can also exclude special urls by adding to the SPECIAL_RE
variable. In both cases, just use a pipeline (|) to separate your entries.
You can add more data to the CSV file so that you can use it in JMeter by customizing the writer
call on line 94 of the script, adding in more details that apachelog recognizes from each log line. Make sure to modify your CSV Data Set module in JMeter to match this new CSV format.
I apologize for the delay in getting this post out, but I hope it’s helpful to you in your load testing endeavors!