Efficient File Sharing P2P Network

In this project, you will implement an “Efficient File Sharing P2P Network” with some advance features.

Create a P2P network topology in which nodes contain files to share with other nodes. You can use the topology created as part of Homework 1.

For topology construction, you will use the solution that you implemented in HW1. Basically, you will have a ORIGIN node that will read in the topology from a given file and MEMBERs will contact the ORIGIN node to figure out their local topology and do necessary  connections.

For that network topology, implement the following features:

• Efficient searching:

When the user wants some file, say, “readme.doc”, user will send the query to nodes in the network. The query for searching the file can be processed using one of the following two techniques.

1) TTL scope based searching:

In this kind of searching, the requester node sends the query to all the neighbor nodes to which it connected. The query flooding will stop after specified number of maximum hops the query can be forwarded. For example, if MAXHOP = 2, the node can send the query to its neighbor nodes and those neighbor nodes can send the query to their neighbor nodes and the flooding will stop.

2) Probabilistic forwarding based searching:

In this kind of searching, the node floods the query in the given network by forwarding the query to neighbor nodes with given probability. For example, if PROBFORW = 70%, then each node sends the query to its neighbor nodes with forwarding probability of 70%. So, there is 70% of chance that each neighbor gets the query from the forwarding node.

• Fast file acquiring:

If the file requested in the query is available at some multiple nodes in network, all nodes send some part of the file to the requester node. (For example, if 4 nodes satisfy the query, the requested file will be divided in 4 parts and all 4 nodes will send respective part to requester node).

Notes:

1) The project should demonstrate the use of multithreading and socket programming.

2) For both searching techniques, there should be provision to stop query flooding. 

3) The requester node should wait for the query responses for the definite period of time called “SEARCHDURATION”. It means that the requester node will use the results which it will get in the duration specified by SEARCHDURATION. For example, if SEARHDURATION = 10Sec, then the requester node will wait for 10 Seconds to know which nodes in the network give the “success” response. The requester node will ignore the responses for that query after SEARCHDURATION time.

4) If the intermediate node contains the file requested and the query reaches to that node, the node will perform two tasks. 1) send the response back to the requester (The response message uses same path as used by the query) and 2) forward the query to its neighbors depend on MAXHOP/PROBFORW.  If the intermediate node does not contain the file requested, it will perform only second task.

5) The query can use only one searching technique at a time. It means that query should not specify both MAXHOP and PROBFORW.

6) The requester node will create direct TCP connection to get the file from the nodes which contains it.

Explanation:

For the example network topology given above,

There are 15 nodes available in this network. They are numbered from 1 to 15. You can also use A - Z or a - z to specify node names.

The network topology file contains the links information between nodes as,

1 2
1 3
1 4
2 5
|   |

To test project functionalities on netXX.utdallas.edu machines, you can assume that each node has one folder which contains the files to be shared in the network. For example, if your netid is abcXXXXXX, create folders 1 - 15 with some files in your abcXXXXXX/ACNProject directory. These folders represents shared folders of the nodes in the given network. Use the files in these folders to check the fuctionality implemented as part of this project.

If node 15 wants file "readme.doc" then it will sent the query as,

NODE15 > query readme.doc MAXHOPS=5 (e.g. search for file "readme.doc" with maximum hops = 5)

or

NODE15 > query readme.doc FORWPROB=70 (e.g. search for file "readme.doc" with forwarding probability = 70%)

Here, Node 15 is connected to only Node 9. So, query reaches Node 9 depending upon MAXHOPS/FORWPROB. If node 9 gets the query and has the requested file, it will give response as "Response: Node 9, readme.doc" back to node 15.

Also, regardless of whether Node 9 contains the file "readme.doc" or not, it will forward the query to its neighbor nodes 5 and 10 based on MAXHOPS/FORWPROB. Node 5 and 10 do similar as Node 9 and forward the query. Here, Node 5 will forward the query to Node 2, 3 and 8. And Node 10 will forward the query to Node 11, 13. This process continues until hop count reaches 0 in case of TTL scope based searching or query flooding completes in case of probabilistic forwarding based searching.

The nodes will remember from which node it got the query in case of giving response back to the requester node. The response message follows the same path as the query message. For example, on the request by Node 15 for file "readme.doc", if node 12 contains that file and it got the query message from path 15-9-5-8-12. The response message "Response: Node 12, readme.doc" should follow the path 12-8-5-9-15.

After sending query, node will wait for some specific amount of time to get the responses. If node does not get any response within SEARCHDURATION period, it is assumed that the file is not available from any peer. Once node 15 will get the responses, it should create TCP connections with the nodes to get parts from them.

If only node 12 has the requested file “readme.doc” then the command should be

NODE15 > get readme.doc Node12

We assume that the content of readme.doc file is the same across different nodes.

When this command will be executed, node 15 will create TCP socket connection with node 12 and upon successful connection, node 12 will send the readme.doc file to node 15.

In the case of “Fast file acquiring”, if multiple nodes 10, 12 and 14 give the response back to node 15 within specified SEARCHDURATION time, the command should be

NODE15 > get readme.doc Node10, Node12, Node14

When node 15 executes this command, it will create TCP connection with nodes 10, 12 and 14. Then, node10 will send initial 1/3 part, node 12 will send middle 1/3 part and Node 14 will send final 1/3 part of "readme.doc" file. When this process completed, Node 15 will get 3 messages of “Part/File sent” from Node 10, 12, and 14. Node 15 will send the reply of “Part/File received” message to all of them. After getting 3 parts from those nodes, Node 15 will merge them to create single “readme.doc” file.

Interesting case:

Here is one interesting case.

As shown in the above figure, if node 3 gets the query 1 from node 1 with MAXHOP = 2 and forward it to neighbor node 4 and after some time, node 3 gets the same query 1 from node 2 with MAXHOP = 3 and forward it to neighbor node  4. When node 3 sends the query for first time, it can go up to node 5 and when it sends the query for second time, it can go up to node 6. In this case if node 5 or 6 contain the file, it will send the response towards requester node. When the response will reach to node 3, there is an interesting case. As node 3 got the same query from node 1 and node 2, the response to requester node can go from either node 1 or node 2. The design of your project should handle this case too so that node 3 should not send multiple responses, one via node 1 and other via node 2. To handle this type of cases, you will assume that a node having the content will send its response only once by responding to the first query message that it receives. This will then prevent node 3 in the above example to receive multiple responses from the same node.