Skip to content

Instantly share code, notes, and snippets.

@Sachin-A
Created July 9, 2018 20:32
Show Gist options
  • Save Sachin-A/b44d96cac8123feaa741aff7989b9a87 to your computer and use it in GitHub Desktop.
Save Sachin-A/b44d96cac8123feaa741aff7989b9a87 to your computer and use it in GitHub Desktop.
Task #2 in ML for delta inductions (Open Profile)

ML Task #2

Problem Statement:

Using clustering for some unsupervised learning!

Clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters)

For this task, there will not be a specified dataset. The selection of dataset is up to you but a good selection of dataset to showcase the power of clustering will be appreciated.

Here's an example dataset you might use for clustering: Link

Normal Mode:

Your task is to use k-means clustering to find clusters in your data.

Hacker Mode:

Use the more general algorithm, expectation maximisation clustering to maximize the overall probability or likelihood of
the data, given the final clusters. 

The central idea: Instead of assigning cases or observations to clusters so as to maximize the differences in means for
continuous variables, the EM (expectation maximization) clustering algorithm rather computes probabilities of cluster
memberships based on one or more probability distributions.

Submission:

You are required to program in Python for the above task.
Normal Mode is required. Hacker Mode is highly encouraged :)
Deadline: 14th July 2018

Required Skills:

1. Python

Limitations:

Usage of libraries that offer clustering functions out of the box is not allowed.

Happy coding!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment