phpdfs: First round of test results

Below are the test results for PHPDFS in its current implementation.

In Summary, PHPDFS in its current implementation does perform as expected according to the RUSH variant algorithm in this paper:

http://users.soe.ucsc.edu/~elm/Papers/ipdps03.pdf

That is the main point of these tests, to show that the current implementation of PHPDFS is correct and does indeed do what it is supposed to do. However, after testing I am not sure that the current implementation will fit the needs of a cloud like store for a web application. There are three main variants of the RUSH algorithms known as RUSHp, RUSHr, and RUSHt. PHPDFS currently has a RUSHp implementation at its core. I am looking at RUSHr as being the best choice because it seems to have the most flexible replication policy for an object which seems to better map to the needs of a web app where the popularity of an asset can suddenly change (say due to a post on slashdot, digg, etc). I will get into my reasoning more in depth in the next blog post.

on to the results...

I used the paper here as a reference test results (section 3 specifically):

http://users.soe.ucsc.edu/~elm/Papers/ipdps04.pdf

all graphs were generated with jgraph:

http://www.aditus.nu/jpgraph/

All test code was executed on a macbook 1.83 Ghz Intel Core Duo with 2 GB 667Mhz DDR2 SDRAM

We cover four areas:

object distribution
failure resilience (replica distribution)
reorganization (adding and removing servers)
look-up performance

All tests except for the look-up performance were done with a configuration of three sub-clusters each containing 5 nodes. 10000 objects were created and placed with three replicas each. I started with a weight of 1 and each sub-cluster was weighted twice as much as the cluster to its left. Meaning:

cluster 1 has a weight of 1
cluster 2 has a weight of 2
cluster 3 has a weight of 4

===============================

Object Distribution:

We can see here that PHPDFS is very close to optimal distribution. All of the RUSH variants exhibit this same behavior.

=================================

Failure Resilience:

This graph shows the replica distribution of objects that were located on disk 7 which theoretically failed.

What we can see here is that the more heavily weighted servers would take most of the request load from a failed disk which is what we would want assuming that the more heavily weighted servers are more powerful.

Immediately below are two more graphs that show replica distribution for disks 1 and 15. You will see from the graphs that the replica distribution has a tendency to favor the more heavily weighted disks. Which again is what we want assuming the more heavily weighted disks are the more powerful servers.

disk 15 replica distribution

disk 1 replica distribution

=================================

Reorganization:

For the reorganization tests I started with the configuration as described above

added a sub-cluster
removed the last disk
removed the first sub cluster

What we are concerning ourselves with here is the number of objects that are moved during a reorganization. Reorganizing typically means we are adding new machines or de-allocating old resources. We compare our results to what we consider to be optimal, optimal meaning that we move the least number of objects to accommodate a new configuration. When we add servers we expect the optimal number of moved objects to be an equal amount taken from each server according to its sub-cluster weight. When we de-allocate resources, we expect the optimal number to be the number of objects held by that server. We see in the results below that PHPDFS does an optimal job when adding new sub-clusters, but performs in sub-optimal fashion when de-allocating resources. This is expected according to the authors of the core RUSH algorithm.

The graph to the left indicates what happens when we add a new sub-cluster of five disks. we see that an optimal amount of objects are moved to accommodate the resources. This is good as adding new servers will probably happen more frequently then de-allocating old servers.

This next graph shows what happens to server when the first sub-cluster is removed. This is something that might happen when a whole cluster of machines has been determined to be too slow to serve any further purpose or maybe they could be put to use elsewhere.

In our example the first sub-cluster contains approximately 4265 objects so we expect the optimal amount of objects moved to be 4265. However, we can see from the graph that MANY MANY more than that were moved. In fact, the total amount actually moved was 27943!!!. ugh that means that 27943 / 30000 = 91% of the objects get moved. That is almost a complete reorganization! As I stated above, we are going to use a different RUSH variant that will not cause such huge movement when we de-allocate resources. RUSHr does allow for sub-cluster and has near optimal reorganization characteristics.

This next graph shows what happens when a disk is removed from the last sub-cluster.

again we see a less than optimal number of objects moved.

the total objects that we expected to move was around 3480 and we actually moved a total of 16567. which means that we moved more than 50% of the objects to accommodate removing a single disk. this just affirms what we saw above for the sub optimal performance and affirms the idea that we should use a different variant of the RUSH algorithm.

=================================

Lookup Performance:

For the look-up performance tests, I created 1 million objects and started with a single sub-cluster of five disks and ran the look-ups for the 1 million objects and took the avg look-up time across all look-ups. This was repeated for 100 sub-clusters of five disks each where a sub-cluster was added with an exponential weighting increase of 1.1 and the look-ups repeated and the avg time for each look-up was recorded. I then repeated this experiment with even weighting so we could get a look at how weighting affects look-up performance.

The graph to the left shows lookup performance and scaling characteristics. The biggest factor affecting scaling in PHPDFS is the number of sub-clusters that must be investigated when performing the lookup.

With evenly weighted disks we expect to see super-linear scaling. We expect O(nr) time where r is the time taken to generate the random numbers for investigating clusters.

But with weighting we expect sub-linear scaling because more objects will be located in the more heavily weighted disks, the more weighted disks will be "rightmost" in the sub-clusters, meaning that they will be investigated first, meaning that we will probably find the object we are looking for within the first few sub-clusters.

We can see from the graph that our expectations are met. When the servers are all weighted evenly we have slightly super-linear scaling. This is because of the number of times we have to call the random number generator when investigating clusters. We could bring an even weighted configuration much closer to linear scaling by using a better random number generator that provides some sort of jumpahead functionality so we will not have to call the rand() function equal to the id of the sub-cluster we are investigating.

In all likelihood, deployments of phpdfs will have sub clusters with uneven weights where the most recently added clusters are more heavily weighted than older clusters so we should get sub-linear scaling in real deployments of phpdfs.

So that's it for now. As you can see, the current implementation of PHPDFS will meet the criteria for most of the stated claims. The problem area is in reorganization. So that will be addressed in upcoming releases. you can download the latest version of the code here:

http://code.google.com/p/phpdfs/downloads/list

My next blog post will be about what algorithm changes we want to make and why. based off of an analysis of the needs of a cloud like system that stores many small images for a web application.

peace,

-Shane

phpdfs

Friday, July 17, 2009

First round of test results

No comments:

Post a Comment

Followers

Blog Archive

About Me