Big Data Clustering

Developing effective, efficient, and highly scalable clustering algorithms for big datasets

During my PhD studies, I explored the following research questions:

  1. Is it possible to obtain an accurate clustering solution using only a small fraction of the available points in a big dataset?
  2. What would be a more precise and practically useful definition of big data?
  3. Is it necessary to employ more complex hybrid algorithms in the search for more accurate clustering solutions? Can we follow the ``less is more’’ principle in designing big data clustering algorithms instead?
  4. Can a decomposition principle be used to achieve global optimization properties in the big data clustering problem?
  5. Is it possible to develop a simple, scalable, and parallelizable big data clustering algorithm that is more effective and efficient than the existing state-of-the-art hybrid approaches?

References

Journal Articles

2024

  1. Mathematics
    High-Performance Hybrid Algorithm for Minimum Sum-of-Squares Clustering of Infinitely Tall Data
    Ravil Mussabayev, and Rustam Mussabayev
  2. ApplSoftComp
    Comparative Analysis of Optimization Strategies for K-means Clustering in Big Data Contexts: A Review
    Ravil Mussabayev, and Rustam Mussabayev

2023

  1. PattRecog
    How to Use K-means for Big Data Clustering?
    Rustam Mussabayev, Nenad Mladenovic, Bassem Jarboui, and Ravil Mussabayev

Books & Conferences

2024

  1. SpringerNature
    Superior Parallel Big Data Clustering Through Competitive Stochastic Sample Size Optimization in Big-Means
    Rustam Mussabayev, and Ravil Mussabayev
  2. SpringerNature
    Optimizing Parallelization Strategies for the Big-Means Clustering Algorithm
    Ravil Mussabayev, and Rustam Mussabayev