Stores large files in distributed manner
The simpliest way to use (assuming you have erlang installed on your machine) is console.sh
file
chmod +x console.sh
./console.sh local
It starts the erlang console and application. In this mode you have a console access to the application through erlang console.
If you want to use this application in distributed way you should run
chmod +x dist_foreground.sh
./dist_foreground.sh
Here you don't have to point to the profile (like local
for console.sh
). Profile names which are the release names are generated by bash script in for loop.
Script starts 3 nodes.
There are common tests for the application.
chmod +x test.sh
./test.sh test
chmod +x dist_test.sh
./dist_test.sh test
To check the service just send some file via curl
curl -F test=123 -F 'file=@googlechrome.dmg' http://localhost:5551
Use your favorite browser and go to the http://localhost:5551/?action=read&name=googlechrome.dmg
Use your favorite browser and go to the http://localhost:5551/?action=delete&name=googlechrome.dmg
Application starts with cowboy listener and application supervisor starts chunk_controller
module.
When cowboy gets incoming request it starts API handler. API handler uses chunk_controller
exported functions to initialize chunk_handler
modules. Controller uses async cast to put each piece of file into handlers respectively it's number. Handlers write data to the file system.
So, main idea is to avoid RAM overflow in case of large file. Every handler process writes its piece of file asynchronously but as controller get the file by chunks that is unlikely to have this problem.
However, reading is made in synchronous way. Here is some space to improve the reading process by partial data preloading but it is a topic for another discussion.
There're some details to improve. First of all I'd definitely improve the way of the chunks metadata stored. In this version all the chunks data is stored into the chunk_controller
state, which is basically means in the RAM. I guess it's okay for the test purposes, but in production it has to be stored into some persistent storage (e.g. Mnesia, disk_copies
mode).
Don't forget to stop the running nodes otherwise you'll get the error like Protocol 'inet_tcp': the name node_2@127.0.0.1 seems to be in use by another Erlang node
or some other error about epmd
.
If you face with theses errors just kill the running erlang nodes
ps -ef | grep erl
You'll get something like
1692471576 17185 1 0 1:07 ?? 0:00.01 /usr/local/Cellar/erlang/21.3.3/lib/erlang/erts-10.3.2/bin/epmd -daemon
1692471576 17282 17232 0 1:07 ?? 0:00.00 erl_child_setup 256
1692471576 17340 17290 0 1:07 ?? 0:00.00 erl_child_setup 256
1692471576 17398 17348 0 1:07 ?? 0:00.00 erl_child_setup 256
...
The second value in each row is the process id. Use
kill -9 17185
to kill every found node.