View on GitHub

elaps'd

application run-time measurement and analysis with ease

Download this project as a .zip file Download this project as a tar.gz file

What does elaps’d do for me?

Suppose the following C++ code:

 1 #include <omp.h>
 2 #define N 1024
 3 
 4 int main(void) {
 5 
 6     double x[N], y[N], z[N];
 7     int i;
 8 
 9     // Do initializations
10     initialize(x, y, z, N);
11 
12     #pragma omp parallel for
13     for (i = 0; i < N; i++) {
14         z[i] = x[i] + y[i];
15     }
16 
17     printResult(z);
18 
19     return 0;
20 }

Alright, this works and adds nicely a vector of size N. So far, so good. But performance is everything, right? For instance assume, you want to know how long the loop runs and possibly also how long each thread computes. These are your possibilities:

Another point is, that you might want some graphical analysis, which you won’t get with gprof or omp_get_wtime.

You: Isn’t there an easy-to-use tool for this purpose?

Yes, there is. And this tool is elaps’d. To make a measurement, we slightly extend the code:

 1 #include <omp.h>
 2 #define N 1024
 3 
 4 #include <elapsd/elapsd.h> //< Include this
 5 
 6 int main(void) {
 7 
 8     double x[N], y[N], z[N];
 9     int i;
10 
11     // Do initializations
12     initialize(x, y, z, N);
13 
14     /* Initialize elaps'd */
15     ENHANCE::elapsd e("vector.db", "Vector Experiment");
16     e.addKernel(0, "Vector (Multithreaded)");
17     e.addDevice(0, "CPU");
18     /* fin */
19 
20     #pragma omp parallel for
21     for (i = 0; i < N; i++) {
22         e.startTimer(0,0); //< Magic happens here
23         z[i] = x[i] + y[i];
24         e.stopTimer(0,0); //< And here
25     }
26 
27     printResult(z);
28 
29     e.commitToDB(); //< Store for analysis
30 
31     return 0;
32 }

That’s all, now compile, link against libelapsd.so and run it. After the run you get a SQLite3 database vector.db which you can analyze. Analysis looks like this:

alt text

Cool, isn’t it? elaps’d automatically recognizes threads when its inside a multi-threaded environment (e.g. in the for-loop). Furthermore, the seperation between kernels and devices lets you clearly define which function (kernel) runs on which device. Thus, also measuring functions on accelerators, e.g. GPGPU, becomes possible – but currently only on a coarse level, i.e. you won’t see GPGPU threads in the analysis.

In addition to that, multiple experiments are supported. That means, you are able to compare the same program with different arguments. To recognize this during analysis, elaps’d enables you to arbitrarily choose names for experiments, kernels and devices.