Skip to content

Instantly share code, notes, and snippets.

@elena-roff
elena-roff / vim-cheatsheet.md
Created July 30, 2018 12:11 — forked from azadkuh/vim-cheatsheet.md
vim / vimdiff cheatsheet - essential commands

Vim cheat sheet

Starting Vim

vim [file1] [file2] ...

@DGrady
DGrady / flatten_spark_schema.py
Last active October 16, 2019 16:00
Flatten a Spark DataFrame schema
"""
The schemas that Spark produces for DataFrames are typically
nested, and these nested schemas are quite difficult to work with
interactively. In many cases, it's possible to flatten a schema
into a single level of column names.
"""
import typing as T
import cytoolz.curried as tz
@AlessandroChecco
AlessandroChecco / Spark Dataframe Cheat Sheet.py
Last active March 29, 2025 14:35 — forked from evenv/Spark Dataframe Cheat Sheet.py
Cheat sheet for Spark Dataframes (using Python)
# A simple cheat sheet of Spark Dataframe syntax
# Current for Spark 1.6.1
# import statements
#from pyspark.sql import SQLContext
#from pyspark.sql.types import *
#from pyspark.sql.functions import *
from pyspark.sql import functions as F
#SparkContext available as sc, HiveContext available as sqlContext.
@rmoff
rmoff / 01_Spark+Streaming+Kafka+Twitter.ipynb
Last active September 17, 2020 17:41
Simple example of processing twitter JSON payload from a Kafka stream with Spark Streaming in Python
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@subfuzion
subfuzion / curl.md
Last active April 19, 2025 09:46
curl POST examples

Common Options

-#, --progress-bar Make curl display a simple progress bar instead of the more informational standard meter.

-b, --cookie <name=data> Supply cookie with request. If no =, then specifies the cookie file to use (see -c).

-c, --cookie-jar <file name> File to save response cookies to.

@jiffyclub
jiffyclub / hdf_to_parquet.py
Last active September 24, 2022 16:05
Do the same thing in Spark and Pandas
"""
Convert Pandas DFs in an HDFStore to parquet files for better compatibility
with Spark.
Run from the command line with:
spark-submit --driver-memory 4g --master 'local[*]' hdf5_to_parquet.py
"""
import pandas as pd
@LeCoupa
LeCoupa / bash-cheatsheet.sh
Last active April 19, 2025 04:38
Bash CheatSheet for UNIX Systems --> UPDATED VERSION --> https://github.com/LeCoupa/awesome-cheatsheets
#!/bin/bash
#####################################################
# Name: Bash CheatSheet for Mac OSX
#
# A little overlook of the Bash basics
#
# Usage:
#
# Author: J. Le Coupanec
# Date: 2014/11/04
@bshyong
bshyong / python_bst.py
Last active March 11, 2021 23:09
Python BST implementation
class BSTnode(object):
"""
Representation of a node in a binary search tree.
Has a left child, right child, and key value, and stores its subtree size.
"""
def __init__(self, parent, t):
"""Create a new leaf with key t."""
self.key = t
self.parent = parent
self.left = None
@kachayev
kachayev / dijkstra.py
Last active March 4, 2025 23:42
Dijkstra shortest path algorithm based on python heapq heap implementation
from collections import defaultdict
from heapq import *
def dijkstra(edges, f, t):
g = defaultdict(list)
for l,r,c in edges:
g[l].append((c,r))
q, seen, mins = [(0,f,())], set(), {f: 0}
while q:
@MLnick
MLnick / sklearn-lr-spark.py
Created February 4, 2013 14:29
SGD in Spark using Scikit-learn
import sys
from pyspark.context import SparkContext
from numpy import array, random as np_random
from sklearn import linear_model as lm
from sklearn.base import copy
N = 10000 # Number of data points
D = 10 # Numer of dimensions
ITERATIONS = 5