dbscan / README.md
README.md
Raw

DBScan

Read data points from CSV file and then used DBScan algorithm to form clusters based on different epsilon values and then visualize the data using Open3D library in Python.

Re-wrote the code by using K-D Search Tree and performed three Experiments on the both Linear and K-D Search tree implementation.

Experiment 1

To validate the results from K-D Search tree with Linear search.
• I obtained similar results from both the linear search and kdtree search. I made a checkNeighborsPresent method which checks for each point in linear search neighbors list with the kdtree search list.
• In Exp1 if you use lin or kd in arg[0] it will print the respective lists but if you use some other input like the one below than Exp1 will print both lin and kd search results and also print if they both have same points or not.

Experiment 2

To compare speeds of both search models in search neigbors for a single point.

Exp2.java Runs:
C:\College\SEM3\CSI2110\P2>javac Exp2.java
C:\College\SEM3\CSI2110\P2>java Exp2 kd 0.5 Point_Cloud_1.csv 10
KDTree search takes 124ms
C:\College\SEM3\CSI2110\P2>java Exp2 lin 0.5 Point_Cloud_1.csv 10
Linear search takes 519ms
C:\College\SEM3\CSI2110\P2>java Exp2 0 0.5 Point_Cloud_1.csv 10 Linear search takes 526ms
KDTree search takes 123ms
C:\College\SEM3\CSI2110\P2>java Exp2 0 0.5 Point_Cloud_2.csv 10
Linear search takes 1760ms
KDTree search takes 325ms
C:\College\SEM3\CSI2110\P2>java Exp2 0 0.5 Point_Cloud_2.csv 10
Linear search takes 1897ms
KDTree search takes 314ms

• I expected that the kd search would 3 times faster than linear but as can been seen from the above runs kdtree search is 5 times faster than the linear search.
• Also, we can see from the Point Cloud 2 file runs which has approx 20,000 more points that how much efficient kd search is. We can see it consistently performs 6 times better than linear search in an even larger file.
• Again Exp2.java has been designed in a way that if you put ‘lin’ or ‘kd’ in arg[0] it will give you the respective result. If you want to see both results please use ‘0’ or some other argument like below.

Experiment 3

To compare the time it takes to form clusters based on different epsilon values.

• For the first file linear is 12 times faster than kd tree. For the second point cloud file linear search is almost 14 times and for the third file almost 15 times faster than kd tree search.
• These results show us that kd tree is good when it comes to some searches but when we apply recurring searches with tree construction within the file it becomes really slow. I expected that kd tree would be 5-6 times slower than linear but given the large amounts of points it is really slow when finding clusters.
• This shows us that we should be careful when choosing the data structure and always consider the real function that it is required to perform.