Hi Folks,
The name for PHPDFS has been changed to WEBDFS this also means the blog is going to change names.
Quite obviously to:
http://webdfs.blogspot.com
The reason for the change is that some friends of mine have been griping at me about calling it phpdfs when the project has nothing to do with php other than being written in php. even though perlbal gets a pass with them. go figure :P.
Another reason for the change is that I can see this project being implemented in other languages in the future so PHPDFS was going to get stale at some point.
peace,
-Shane
Sunday, August 30, 2009
Friday, August 28, 2009
PHPDFS on 20 Amazon ec2 instances
Hello Again,
Download PHPDFS here: http://code.google.com/p/phpdfs/downloads/list
This blog post is about a real world test of PHPDFS tested on 20 Amazon ec2 instances.
I setup 20 m1.large amazon instances and uploaded ~250GB of data and downloaded ~1.5 terabytes worth of data from the files that were uploaded. Total transfer was 1.8 TB
I have included all of the pertinent information at the end of this blog post. Here are some highlights:
http://civismtp.uas.coop/phpdfs/c5n3-2.tar.gz
If anyone does download and check the data and has questions please do feel free to contact me with any question you might have. :)
The final results tell me that PHPDFS performed quite well. However the data loss is undesirable and needs to be addressed. How do we prevent that? One strategy would be to make absolutely sure that we hold on to the client until we know undoubtedly that the first replica has been successfully written to disk. This can be coupled with some sort of a checksum mechanism to be sure that the file upload is not corrupted. The pont is we do not want to tell the client that an upload was ok when it was not and we do not want to replicate a corrupted file. This exactly what happened which led to 40 of the replicas becoming totally unrecoverable meaning that we did not have a good copy from which we could make other replicas.
Another solution might be to increase the replication factor. However, there is something to consider with phpdfs when increasing the replication factor (at least in its current state), it increases the write load on your over all system by a factor equal to the increase. So if your replication factor is three, then you will actually be performing 3 writes against your system not just one, so 100 uploads is then effectively equal to 300 uploads worth of load on the overall system. This creates the need for a more robust replication mechanism that will not linearly increase the overall load on the system when the replication count is increased, perhaps some sort of tcp pipelining will help with that. I still need to do some research on this before I make a final decision.
So how did the servers perform? There are 870 graphs of data of 300 mb of data that was collected by dstat http://dag.wieers.com/home-made/dstat/. You can find all of the graphs here:
http://civismtp.uas.coop/phpdfs/graphs
I have included a couple load avg graphs below to show what how a typical box performed and the graph from the one server that got really hot.
The graphs show the machines stayed mostly cool, load averages were in the 3-4 range most of the time on most machines and the CPUs were rarely maxed out. There was one exception, one machine went to 100 load avg and stayed there for most of the test until I rebooted it after which point the machine went to and stayed at a level similar to the rest of the boxes in the cluster. I am not sure why the one server got really hot. After analyzing the server logs, I do not see any particular request pattern that would make the one server go really bad. However, it is obvious that this server was the reason for the data loss and problems experienced during the test.
Load avg graph of typical server during the test:
Below is the load avg graph of the server that became very hot during the test. You can see that after about 1000 secs the box suddenly became very hot. None of the other boxes exhibited this behavior. Also, you can see the obvious point where I rebooted the box and how the server never went back to to a high load average. further analysis is needed to determine why this might have happened.
Below are the specifics of the test and the results:
Hardware and Software:
Test Parameters:
Test Results
Data Distribution:
The table below describes how many replicas were saved on each server. As we can see the data was well distributed amongst all nodes.
Download PHPDFS here: http://code.google.com/p/phpdfs/downloads/list
This blog post is about a real world test of PHPDFS tested on 20 Amazon ec2 instances.
I setup 20 m1.large amazon instances and uploaded ~250GB of data and downloaded ~1.5 terabytes worth of data from the files that were uploaded. Total transfer was 1.8 TB
I have included all of the pertinent information at the end of this blog post. Here are some highlights:
- 500 threads (5 java client, 100 threads each)
- 15 servers
- ~250 GB uploaded (PUT requests) (individual files between 50k and 10mb)
- ~1.5 Tb downloaded (GET requests)
- ~1.8 TB transfer total
- ~47mb / sec upload rate
- ~201 requests / sec overall
- The data was very evenly distributed across all nodes
- 40 replicas were lost and totally unrecoverable amounting to .030% data loss
http://civismtp.uas.coop/phpdfs/c5n3-2.tar.gz
If anyone does download and check the data and has questions please do feel free to contact me with any question you might have. :)
The final results tell me that PHPDFS performed quite well. However the data loss is undesirable and needs to be addressed. How do we prevent that? One strategy would be to make absolutely sure that we hold on to the client until we know undoubtedly that the first replica has been successfully written to disk. This can be coupled with some sort of a checksum mechanism to be sure that the file upload is not corrupted. The pont is we do not want to tell the client that an upload was ok when it was not and we do not want to replicate a corrupted file. This exactly what happened which led to 40 of the replicas becoming totally unrecoverable meaning that we did not have a good copy from which we could make other replicas.
Another solution might be to increase the replication factor. However, there is something to consider with phpdfs when increasing the replication factor (at least in its current state), it increases the write load on your over all system by a factor equal to the increase. So if your replication factor is three, then you will actually be performing 3 writes against your system not just one, so 100 uploads is then effectively equal to 300 uploads worth of load on the overall system. This creates the need for a more robust replication mechanism that will not linearly increase the overall load on the system when the replication count is increased, perhaps some sort of tcp pipelining will help with that. I still need to do some research on this before I make a final decision.
So how did the servers perform? There are 870 graphs of data of 300 mb of data that was collected by dstat http://dag.wieers.com/home-made/dstat/. You can find all of the graphs here:
http://civismtp.uas.coop/phpdfs/graphs
I have included a couple load avg graphs below to show what how a typical box performed and the graph from the one server that got really hot.
The graphs show the machines stayed mostly cool, load averages were in the 3-4 range most of the time on most machines and the CPUs were rarely maxed out. There was one exception, one machine went to 100 load avg and stayed there for most of the test until I rebooted it after which point the machine went to and stayed at a level similar to the rest of the boxes in the cluster. I am not sure why the one server got really hot. After analyzing the server logs, I do not see any particular request pattern that would make the one server go really bad. However, it is obvious that this server was the reason for the data loss and problems experienced during the test.
Load avg graph of typical server during the test:
Below is the load avg graph of the server that became very hot during the test. You can see that after about 1000 secs the box suddenly became very hot. None of the other boxes exhibited this behavior. Also, you can see the obvious point where I rebooted the box and how the server never went back to to a high load average. further analysis is needed to determine why this might have happened.
Below are the specifics of the test and the results:
Hardware and Software:
Description | |
Hardware | Amazon ec2 instance m1.large |
CPU | 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each) |
Operating System | Fedora Core 9 64 bit |
Memory | 7.5 GB |
File System | Ext3 |
Disk I/O | "High" |
Storage | 850 GB instance storage (2×420 GB plus 10 GB root partition) |
Test Parameters:
Total | |
total servers | 15 |
sub clusters | 5 |
servers per sub cluster | 3 |
total clients | 5 |
total threads per client | 100 |
total threads | 500 |
read:write ratio | 8:2 |
replication degree | 2 |
Total to be uploaded | 250 Gb |
Total to be written to disk | 500 Gb |
Total uploaded by each client | 50 Gb |
Thread Task Description | Each thread uploaded and downloaded files between 50k and 10mb in size. An attempt was made to simulate real world usage by having a 20% write load. So for every 8 GETs there would be on average 2 PUTs. Each thread slept for a random number of milliseconds between 0 and 1000 (0 and 1 second) between each request. |
The files sizes uploaded were: |
|
Test Results
Total | |
Total Requests | 1067159 |
Total Run time | ~5300 seconds |
Total Uploaded | 250000850000 bytes |
Total Upload Rate | ~47169971.7 bytes / second |
Avg Download per replica | 1933823.62 bytes |
Total Downloaded | 808603 |
Requests / Second | ~201 req./second |
Total Data Transferred | 1813697450000 bytes |
Overall Transfer Rate | 342207066 / second |
Total Original Replicas | 129278 |
Total Replicas written to disk | 258556 |
Total Written to disk | 499200710256 bytes |
Avg written to disk per replica | 1930725.69 bytes / replica |
Total Lost Replicas | 0 |
Total corrupted but recoverable replicas | 2071 |
Total Non-Recoverable replicas due to corruption(both copies were corrupted) | 40 |
Data Distribution:
The table below describes how many replicas were saved on each server. As we can see the data was well distributed amongst all nodes.
Server | No of Replicas | Server | No of Replicas |
1 | 17406 | 9 | 17623 |
2 | 17914 | 10 | 16679 |
3 | 17153 | 11 | 17606 |
4 | 17329 | 12 | 17256 |
5 | 17426 | 13 | 16703 |
6 | 16955 | 14 | 17190 |
7 | 17082 | 15 | 17145 |
8 | 17089 |
Subscribe to:
Posts (Atom)