Skip to content

Instantly share code, notes, and snippets.

@DanielEFrampton
Last active April 1, 2020 21:26
Show Gist options
  • Save DanielEFrampton/6e4e5298eafeaf2d034816c50376dbb5 to your computer and use it in GitHub Desktop.
Save DanielEFrampton/6e4e5298eafeaf2d034816c50376dbb5 to your computer and use it in GitHub Desktop.
Rosetta Cross-Pol Project Pitch

Daniel Frampton Mod 4 Cross Pollination Pitch

Pitch

Rosetta: a programming education app that uses machine learning & natural language processing to compare data from official language docs to provide comparisons between different default library functions and syntax to ease the process of learning new languages.

For example, I want to know what the Python function is works similarly to Ruby's .gsub String method; Rosetta looks for language similar to that of the description of Ruby's string method in the Python docs, and returns me the top five matches with a relevance rating. The ML and NLP might be helped along by some manual connections between similar libraries and concepts, and helped to learn by providing concrete known examples.

The app would begin by implementing Ruby-to-Python and, if time allows, Ruby-to-JavaScript. Different versions of a language can be compared by searching their respective docs, and there's the potential to search package libraries if the default library does not have a good match. If actual code is available, the server could have VMs available to actually run code snippets and do some meta-programming to return possible ways you could get the same outputs from the same inputs.

Reason

Learning second and third programming languages can be challenging because the methods you've become accustomed to having at your disposal are now named differently, function differently, or are simply do not have an analogous function in the other language.

Solutions can be found through a Google search, but they are typically StackOverflow or forum posts and are frequently difficult to rely upon because of version changes in the target language. Looking through the target language's documentation, while worthwhile, can be time-consuming and confusing because of terminology differences.

Having a central tool to reference which automates these comparisons across multiple languages and versions would streamline the learning process for newcomers and be a helpful reference for experienced developers.

Tech Stack

Python would be used for the back-end service because of its suitability for machine learning tasks, along with these libraries:

  • Flask for overall app framework.
  • Beautiful Soup and Selenium (and possibly Scrapy) for parsing the documentation websites.
  • SpaCy or PyText/PyTorch for NLP and machine learning.
  • MongoDB, PyMongo and MongoEngine for database interactions and result caching.
  • Flask-GraphQL and Graphene-Mongo to serve GraphQL endpoint.

React and Redux would be used for in front-end service to develop a single-page web interface. Apollo Client would be used to send GraphQL queries to the Python back-end.

MVP

  1. On root "Search" page, user can enter or select a function from one language (at minimum, Ruby) and select a target language (at minimum, Python) and target version (most recent stable, by default) and receive 5 search results from the docs of that language and version thought to be comparable by NLP/ML system, with sample description and sample code displayed for each and ranked by a relevance rating.

  2. When "Compare" button is clicked next to a search result, user can view the documentation side-by-side and see samples of the syntax for both languages' functions/keywords, links to open interactive REPL sessions to experiment with the functions, and a copy-to-clipboard button which allows the user to copy a template version of the target language's function.

  3. User can browse "Dictionary" directory of all classes/methods for a particular language (at minimum, Ruby) alongside the cached top search result for each method in a target language (at minimum, Python), which can be clicked to open the comparison view described in feature 2.

Instructor Feedback

To get started on the project, both for your pitch and for your MVP, we'd like to caution you not to bite off more than you can chew. We'd like to start you out by limiting the number of languages, say just Ruby and JavaScript to get started, and to limit yourself to only a single version of each language (say Ruby 2.5 and JS ES6), and to further limit you to only scraping the docs for, say, Array functions.

While I super love the idea of "google translate for programming", the scope of this is extremely ambitious, and may also be backend-heavy, so you'll need to find a good way to keep your FE folks engaged on the project with equal amounts of things to contribute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment