Skip to content

Instantly share code, notes, and snippets.

@idan
Created September 24, 2012 22:13
Show Gist options
  • Save idan/3778760 to your computer and use it in GitHub Desktop.
Save idan/3778760 to your computer and use it in GitHub Desktop.

Visualizing Github

A treasure trove of data is captured daily by Github; it has become our shared consciousness of thoughts made code. What stories can that data tell us about how we think and work? How would one go about finding and telling those stories? This talk is a soup-to-nuts tour of practical data visualization with Python and web technologies, covering both the extraction and display of data in illumination of a familiar dataset.

Detailed Description

In the time that we have been crafting software, our collective efforts have never been cataloged neatly in one centralized location. Some projects have long developed in the open, and some have even exposed their development history in some form or another—but the connections between multiple projects remained hidden.

These connections between multiple developers and multiple projects are the glue that binds us together into larger developer communities—they are our mirror, and for the first time we can take a look at ourselves with the aid of the Github API and our favorite dynamic programming language.

Github provides the perfect case study in the practice of extracting and presenting meaning from data. Come watch us tell a story about telling new stories with a familiar dataset: the tools, the techniques, and the thinking behind our anthropological journey into the largest coding community.

Outline

Part 1: Data to Information

  • Introduction

    • The art of storytelling when you don’t know the story ahead of time.
    • The seven stages of Data Visualization as per Ben Fry: Acquire, Parse, Filter, Mine, Represent, Refine, Interact.
  • Act one: From data to information

    • Data rarely comes neatly packaged.
    • The practicalities of getting data out of APIs. A brief tour of the data acquisition toolbox in python: (requests, celery, beautiful soup, pyparsing, ipython notebook. pandas?) Being a polite data-slurping netizen (optimizing data access by queries, dealing with rate limits).
    • The hard part: teasing out a story. Who is your audience? What would entertain and enlighten them? The essence of journalism.
    • Storing the data for display: how will data be queried? Does it even make sense to store it all in one kind of database? At scale, your data begins to look a lot like your presentation.

Part 2: Information to Meaning

XXX TODO

Other information:

I'm Django's "Benevolent Designer for Life"; as a member of the core team I'm responsible for issues which touch on the needs of frontend developers as well as anything which can be improved through design.

As a designer/developer hybrid, I've spoken at three DjangoCons, including my keynote address at DjangoCon US 2011. I gave two very well-received talks at PyCon last, one of which was used as the basis for a new curriculum in high schools in Alaska. I've also spoken numerous times at local Python and web development meetups. I think I’m within bounds to say that I can deliver a fun and engaging experience to the PyCon audience.

My recorded talks and related materials are available on Lanyrd.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment