I'm making an app where you can input a method/function you know in given language and ask for its equivalent in another, and the app will use machine learning (and, perhaps, natural language processing or NLP) to compare the two sets of language docs and returns the best guess at that equivalent function based on text similarity and weighted by up/downvotes.
The only problem? I have no idea what I'm doing. Come along with me as I document my learning process, hacking our way through the jungle undergrowth of obtuse terminology in hopes of coming out the other side older, wiser, and not murdered by my teammates for coming up with this idea.
Machine learning, a.k.a. deep learning, a.k.a. the use of neural networks, is complex but fascinating. A feature article on Ars Technica was my first entry-point into the details of how ML works, and this series of videos on Neural Networks by 3Blue1Brown came well-recommended and offered a more detailed and well-visualized introduction. The free Elements of AI course, provided by the University of Helsinki and Reaktor, offers a more thorough and well-rounded guide to the subject.
At a very high level, machine learning (or "deep learning") is a process by which artificial intelligence (AI) algorithms--specifically, "neural networks"--can improve over time at a given task such as classification of an input. The popular example used by most ML tutorials is that of a neural network classifying images of hand-written digits according to which digit they represent, 0 through 9.
A static, linear approach to this problem would involve writing thousands of conditional rules to account for all the ways a digit could be written, and those rules would need to be constantly adjusted to account for edge case exceptions. In some ways this is how an untrained neural network starts out; but by providing a neural network with large amounts of training data, where each input is accompanied by its correct classification, because of the way the neural network is constructed it can "backpropogate" and tweak each of the thousands of rules in its network of "neurons," each of which is simply a mathematical function. After doing so, a neural network can be fed test data so it can attempt to generalize from what it learned from the training data and classify non-annotated inputs.
Use cases for NLP:
- Semantic tagging