The first test is the longest...

Technical chat for the techies and development testers
Post Reply
dave
Site Admin
Posts: 260
Joined: Fri May 30, 2008 9:09 pm
Location: UK
Contact:

The first test is the longest...

Post by dave » Sat Jun 14, 2008 5:57 pm

Following feedback and testing it seems that the first tests a node makes takes longer than later sequenced tests with exactly the same options.

This was initially believed to be DNS times so the system was updated to pre-resolve DNS (and use the IP address in the test directly where possible). Though this does seem to have slightly reduced some time delays the issue still seems to be there. This has been raised as a bug (ID 148).

The problem can be seen in the image at the bottom which shows two tests at various levels of zoom (rangemax). The two are identical web time tests, the one on the left being the first test on the node and the right-hand one the fourth.

I think this is down to the sheer number of tests running at once. The system fires off testers every five minutes and spawns them all near-simultaniously. The load in terms of MySQL connections (and in this case outgoing network sockets) is therefore at it's maximum right at the start. Later tests are likely to have less running alongside them (other nodes with only one or two tests configured having finished) or at least be more out-of-sequence with each other (the first tests will execute near-simultaniously the but second tests will only begin after the first has finished which will take varying times on different nodes so the second tests and onwards execute out of step with each other).

There are various potential ways to solve this problem (if that is indeed the cause) but none of them are cleanly implementable with the current setup (either requring a dirty hack or a big rewrite to core code).

As the plan is to eventually have a much better time system that isn't just a crude five-minute job then it may be better to wait (perhaps just implementing an optional dirty fix for those users who find it a specific problem) and do that big rewrite with the problem in mind.

Anyway - investigations continue...

If anyone has any ideas on what may be causing this or a neato solution please drop me a line in public here or in private here.

So far the "fix" bodges considered include:
  • Random start delays - the node tests are randomly delayed between 10 and 60 seconds before starting to avoid clashing with each other as much as possible
  • Rolling Testing - the nodes are tested over a rolling four minute period so if you have two nodes they are started at 0s and 120s, four 0s, 60s, 120s, 180s and 240s etc...
Both of these have potential implications with regard to graph "cleaness".


Example showing the same configured web time tests running first and fourth in sequence on a node. The first test can be seen to consistently take 0.05-0.1s longer and also have many more "above normal" times as well
Image

dave
Site Admin
Posts: 260
Joined: Fri May 30, 2008 9:09 pm
Location: UK
Contact:

Update: The first test is the longest...

Post by dave » Mon Jun 16, 2008 6:33 pm

Well... over the weekend I put in a simple delay feature in test-threaded.sh. This is now being tested but will allow you to specify a delay using the tests.spawndelay variable.

See the picture below for what a difference it makes compared to the original all-at-once method. The delay was enabled for the middle period of the graph. The average is much nearer to the second test and the "spiking" has gone.

Image

Dr_Watso
Posts: 2
Joined: Tue Jun 17, 2008 9:23 pm

Re: The first test is the longest...

Post by Dr_Watso » Tue Jun 17, 2008 9:40 pm

Interesting... I had noticed that as well, and also thought it was probably just the DNS call each time... It never actually bothered me, and in fact for the future notes, if you do decide to use the DNS caching; it would be on the nice side to have a test that makes a DNS request of your choice! Not only could it be used for testing the internal DNS servers, but you could also use it to test the external DNS sources your servers are forwarding non-authoritive requests to.

Sorry for going a bit off-topic! Back to your regular programming!

dave
Site Admin
Posts: 260
Joined: Fri May 30, 2008 9:09 pm
Location: UK
Contact:

Re: The first test is the longest...

Post by dave » Tue Jun 17, 2008 10:13 pm

Hello again,
Dr_Watso wrote:...for the future notes, if you do decide to use the DNS caching; it would be on the nice side to have a test that makes a DNS request of your choice! Not only could it be used for testing the internal DNS servers, but you could also use it to test the external DNS sources your servers are forwarding non-authoritive requests to.
DNS testing is something I definately want to do. All tests are now pre-caching or using (where possible) the IP address before any timing starts.

I was originally just going to put a DNS lookup test in that did a gethostbyname or gethostbyaddr but the problem with that is any local cache will probably still have the IP anyway (apart from when annoyingly for timing reasons it decides not to) or at least your LAN nameserver will have it cached.

So I want to, like you suggest, provide a wide range of DNS tests from a simple "local" IP lookup to fully recursive and direct to a nameserver queries.

Sadly there appears to be no easily usable libraries for PHP that will do this off-the-bat (though I have not looked that hard).

I intend to have some "fun" (my idea of it at least) writing a socket-level DNS client at some point but will have to brush up on protocols etc first.

Cheers,

Dave.

Post Reply