Tuesday, November 23, 2010

XenGatherer Update

Today the XenGatherer tool got a bit more functionality, an update about this progress will be given first. This will be followed by a description of the issues that still exist. Next week we'll have a look at the overhead that running this tool causes (in particular on a CPU usage level).

The things that were added to/modified on the XenGatherer tool today:
  • First of all some small issues that were introduced while coding at home (without being able to run everything) were fixed (e.g. disk columns were not well formatted).
  • The overview of the CPU usage was made more readable (see snapshot under the blogpost).
  • Dom0 stats were added to the disk section of the resource usage overview. In this overview a MB/s read and write column was added (iostat provided a -m option to facilitate this).
  • An option to provide a file containing the vm names to monitor was added, namely "--vmfile". If this optional argument is not provided the application will now parse the 'xm list' output and use all the vms that are currently running.
  • After copying the 'vmlinux' kernel file to the node (and rebooting the VMs) the oprofile functionality to monitor the occurrence of cache miss and hit events worked fine. But since the generated output was rather extensive (eg for one domain there was a counter of samples for the kernel, xen, modules,...), we now take the sum of these counters (for each domain) and present these values in the overview. Note that there is still an option that can be used when the full output is preferred: "--nosumcache".
Here the topics that need further attention are listed:
  • First of all dom0 stats should still be added to the cache hits/misses section.
  • The CPU stats cause problems when the domain id crosses the 32 boundary, the XenBaked code should be optimized such that no problems arise.
  • Also, we should have a look whether the LLC_REF events do indicate a cache hit or whether the cache miss events should be subtracted from this number to get the number of cache hits.
  • And last but not least, the source code/design should be reviewed and comments should be written.

Friday, November 19, 2010

Xen Monitoring Tool (cache)

The monitoring tool now gathers information about the cache hits and misses as well, using OProfile. Through program arguments it is now possible to choose what will be monitored (e.g. --nodisk triggers the disk information to no longer be included). Also the output is formatted now, such that the columns are alligned and everything is more readable. An example of this output can be downloaded here.

Friday, November 12, 2010

Xen Monitoring Tool (disk and cpu) & Xen 4.0

On Tuesday I continued working on the monitoring app: disk and cpu metrics are now gathered and written to file. Also to make sure the tap:aio interface worked correctly when starting VMs, I had to install a newer Xen version (4.0). The way the application works now is by first calling the start operation on the XenBaked, IOStat and OProfile controllers, then a given command is executed on the specified VMs before going to sleep for the given duration. Afterwards the stop method of the controllers is called, followed by a report (write output to file) and parse (get the desired information from the output files) call for the different tools. Finally this disk, cpu and cache information is written to file. Working this way makes sure IO load is only generated after the experiment is finished, by writing to file when report is called after the monitoring has stopped. Also the tool is lightweight because the parsing of the output and creation of the actual output file can be performed on a separate machine. What's next? Finishing the prototype of the monitoring tool by completing the cache hits and misses information. And making sure the results are correct, by running the app under certain VM workloads.

Sunday, November 7, 2010

Patch, IOStat & Parsing

Friday the 5th of November I first of all got the patch finally working. This wasn't working before, because I was creating a patch while the patched kernel was compiled and the original one was not (this diff resulted in an extremely large patch file).

Afterwards both Sam and I worked on the XenGatherer tool: Sam added the functionality of IOStat so now it is possible to monitor the disk usage of the virtual machines and dom0. I started parsing the ouput files, so our tool generates an output file containing only the required information.

Tuesday, November 2, 2010

OProfile => XenGatherer

I extended the XenGatherer prototype such that when it is started opcontrol starts gathering information about cache misses (event LLC_MISSES):
  • opcontrol --reset
  • opcontrol --start-daemon --event=LLC_MISSES:10000 --xen=... --vmlinux=...
  • opcontrol --start
When the XenGatherer tool is stopped, opcontrol is stopped as well:
  • opcontrol --stop
  • opcontrol --shutdown
And when you choose to make a report (throught the XenGatherer CLI) the retrieved information is written to file. The information gathered by OProfile is requested by the following command:
  • opreport event:LLC_MISSES
An updated version of the Python source files can be found here.