Skip to content

Instantly share code, notes, and snippets.

View argenisleon's full-sized avatar

Argenis Leon argenisleon

View GitHub Profile
@yackermann
yackermann / verify.androidkey.webauthn.js
Last active August 14, 2021 07:18
WebAuthn Android Keystore attestation verification sample in NodeJS
const crypto = require('crypto');
const base64url = require('base64url');
const cbor = require('cbor');
const asn1 = require('@lapo/asn1js');
const jsrsasign = require('jsrsasign');
/* Android Keystore Root is not published anywhere.
* This certificate was extracted from one of the attestations
* The last certificate in x5c must match this certificate
* This needs to be checked to ensure that malicious party wont generate fake attestations
@anish749
anish749 / MostCommonValue.scala
Last active January 25, 2026 04:00
Spark UDAF to calculate the most common element in a column or the Statistical Mode for a given column. Written and test in Spark 2.1.0
package org.anish.spark.mostcommonvalue
import org.apache.spark.sql.Row
import org.apache.spark.sql.expressions.{MutableAggregationBuffer, UserDefinedAggregateFunction}
import org.apache.spark.sql.types._
import scalaz.Scalaz._
/**
* Spark User Defined Aggregate Function to calculate the most frequent value in a column. This is similar to
@4rzael
4rzael / main.md
Last active February 14, 2026 13:33
GIS with pySpark.
NOTE : Take a look at the comments below !

GIS with pySpark : A not-so-easy journey

Why would you do that ?

Today, many datas are geolocalised (meaning that they have a position in space). They're named GIS datas.

It's not rare that we need to do operations on those, such as aggregations, and there are many optimisations existing to do that.

@joshlk
joshlk / faster_toPandas.py
Last active September 19, 2025 16:11
PySpark faster toPandas using mapPartitions
import pandas as pd
def _map_to_pandas(rdds):
""" Needs to be here due to pickling issues """
return [pd.DataFrame(list(rdds))]
def toPandas(df, n_partitions=None):
"""
Returns the contents of `df` as a local `pandas.DataFrame` in a speedy fashion. The DataFrame is
repartitioned if `n_partitions` is passed.
@chenjianjx
chenjianjx / start-celery-for-dev.py
Created March 10, 2016 10:45
A python script which starts celery worker and auto reload it when any code change happens.
'''
A python script which starts celery worker and auto reload it when any code change happens.
I did this because Celery worker's "--autoreload" option seems not working for a lot of people.
'''
import time
from watchdog.observers import Observer ##pip install watchdog
from watchdog.events import PatternMatchingEventHandler
import psutil ##pip install psutil
import os
@kenju254
kenju254 / mongo_chunk.py
Created December 13, 2015 13:31 — forked from 0m15/mongo_chunk.py
Python script that can export Mongo collections of any size into progressive numbered json chunks, type `python mongo_chunk.py --help` to have a list of available options
#!/usr/bin/python
import argparse
from pymongo import MongoClient as Client
from bson import BSON
from bson import json_util
import json
import os
# mongo client
@dapangmao
dapangmao / s.bash
Last active September 1, 2020 16:28
How to set up a spark cluster on digitalocean
sudo openvpn --config *.opvn
apt-get update
apt-get install vim
wget http://d3kbcqa49mib13.cloudfront.net/spark-1.3.0-bin-hadoop2.4.tgz | tar zxf
hadoop fs -mkdir /spark
hadoop fs -put spark-1.3.0-bin-hadoop2.4.tgz /spark
hadoop fs -du -h /spark
cp spark-env.sh.template spark-env.sh
@hest
hest / gist:8798884
Created February 4, 2014 06:08
Fast SQLAlchemy counting (avoid query.count() subquery)
def get_count(q):
count_q = q.statement.with_only_columns([func.count()]).order_by(None)
count = q.session.execute(count_q).scalar()
return count
q = session.query(TestModel).filter(...).order_by(...)
# Slow: SELECT COUNT(*) FROM (SELECT ... FROM TestModel WHERE ...) ...
print q.count()
@sloria
sloria / bobp-python.md
Last active February 24, 2026 02:09
A "Best of the Best Practices" (BOBP) guide to developing in Python.

The Best of the Best Practices (BOBP) Guide for Python

A "Best of the Best Practices" (BOBP) guide to developing in Python.

In General

Values

  • "Build tools for others that you want to be built for you." - Kenneth Reitz
  • "Simplicity is alway better than functionality." - Pieter Hintjens