Tuesday, June 30, 2009

Hello World

This post is introducing a project I started about 1 week ago. I have called it phpDFS meaning php distributed file system (DFS). The aim of the project is to give the PHP community a DFS that is similar to MogileFS. Perl has Mogile (which rocks) and now php has phpDFS. Obviously, Danga was much more clever than I in choosing names for their systems. huh, oh well.

source code is here:

http://code.google.com/p/phpdfs/

version 0.00 is imminent.

The biggest reason I have started this project is that I have a love and fascination for distributed, scalable systems and this will be one of many expressions. :) Also, I, of course, will use it in future projects.

phpDFS is mostly based on the algorithm described in this paper ( PDF ):

http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=1B5D780A8525B36C150B0D028DC73F4F?doi=10.1.1.12.6274&rep=rep1&type=url&i=0

The algorithm comes from a family of algorithms known as the RUSH family; Replication Under Scalable Hashing. If built correctly, a system built on the RUSH algorithms will have the following characteristics: (some the text below is taken from the algorithm whitepaper)
  • Ability to map replicated objects to a scalable collection of storage servers or disks without the use of a central directory.

  • Redistributes as few objects as possible when new servers are added or existing servers are removed

  • Guarantees that no two replicas of a particular object are ever placed on the same server.

  • No central directory, clients can compute data locations in parallel, allowing thousands of clients to access objects on thousands of servers simultaneously.

  • Facilitates the distribution of multiple replicas of objects among thousands of disks. Allows individual clients to compute the location of all of the replicas of a particular object in the system algorithmically using just a list of storage servers rather than relying on a directory.

  • Easy scaling management. Scaling out is just a matter of deploying new servers and then propagating a new configuration to all of the nodes. The data will automatically and optimally be moved to accommodate the new resources.

    De-allocating resources is basically the same process in reverse. Simply deploy the new configuration and the data will be moved off the old resources automatically.After the data has been moved, simply take the old resources off line.

  • Easier server management. Since there is no central directory, there are no master or slaves to configure. No master or slaves means that all resources are utilized and no servers sit unused as "hot" spares or backups.

  • No single point of failure. As long as the replica to node ratio is correct, your data will be safe, redundant, and durable; able to withstand major server outages with no loss.
That's pretty cool. I hope that phpDFS will capture all of the above for the PHP and Web communities in a very easy to use and extend package.

ok, I am going back to work on the phpDFS. a release is coming soon.

peace,

-Shane