- What the problem is
- What prior methods entail
- What you propose/claim/hypothesize in this work
- How/why is it better
- Experimental support for your proposal
- Any additonal insights (if applicable)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
''' | |
DESCRIPTION: | |
This simple convenience function provides parallelization of pandas .apply() | |
Adapted from: https://proinsias.github.io/tips/How-to-use-multiprocessing-with-pandas/ | |
REQUIREMENTS: | |
`multiprocess` and `dill` packages are required. | |
``` | |
python -m pip install multiprocess dill | |
``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def load_csv(csv_path: Path, ignore_first_row=True, ignore_empty_rows=True, delimiter=','): | |
''' | |
Returns all the rows of a csv file | |
''' | |
rows = [] | |
with csv_path.open() as csvfile: | |
csv_reader = csv.reader(csvfile, delimiter=delimiter) | |
if ignore_first_row: | |
next(csv_reader) | |
for row in csv_reader: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pandas as pd | |
def jsonl_to_df(jsonl_filepath): | |
return pd.read_json(jsonl_filepath, lines=True) | |
def df_to_jsonl(df, jsonl_filepath): | |
payload = df.to_json(orient='records', lines=True) | |
with open(jsonl_filepath, 'w') as writer: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
''' | |
Simple script to split a PDF using PyPDF2 package in Python. | |
Often times we would need to split an academic paper into the main paper and the | |
supplementary material before submission. | |
To do that, the script may be simply run as: | |
`python split_pdf.py -in CVPR.pdf -s 15 -o` | |
This produces 2 files: 'CVPR.01-14.pdf' and 'CVPR.15-20.pdf', where the starting | |
page numbers for each split file are 1 and 15 respectively. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
''' | |
Workaround for logging a simple table that supports step sliding. (See issue https://github.com/wandb/wandb/issues/6286) | |
It's a great pity that wandb currently doesn't support this with the `wandb.Table` which is too overkill. | |
The `wandb_htmltable` function follows the same signature as `wandb.Table` and takes as input parameters of the same type. | |
It currently only supports text and image type data. Image data is realized via its byte string declared in the <img /> tag | |
Example: | |
``` | |
my_data = [ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
''' | |
Resizes images in source image directory within given size bounds (keeping | |
aspect ratio) and outputs in target directory with identical directory tree | |
structure. Uses Magick for image resizing. | |
''' | |
import os | |
import argparse | |
import subprocess | |
from pathlib import Path |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# STEP 1: `$ mkdir ~/bin` | |
# STEP 2: `$ touch ~/bin/sshfr` | |
# STEP 3: `$ chmod +x ~/bin/sshfr` | |
# STEP 4: Copy the following contents into `~/bin/sshfr` | |
# STEP 5: Update .profile or .bash_profle: `$ export PATH=$PATH":$HOME/bin"` | |
# STEP 6: Reload .profile or .bash_profle E.g. `$ . ~/.bash_profile` | |
# The contents of sshfr is as follows | |
ADDRESS=$1 | |
PORT_START=${2-49151} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from datetime import datetime | |
import os | |
import pandas as pd | |
import argparse | |
''' | |
Note: | |
- Entries start on row 3 of EduRec excel exports | |
- 'Student Number' column is mandatory! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
Intended usage scenario: | |
You have a directory of pdfs, each comprising of sequential image scans of | |
human-annotated documents (e.g. written questionaries/forms/exams) where every | |
document share the same number of pages. Each pdf may contain different | |
numbers of such scanned documents. You want to split all these pdfs up into | |
smaller pdfs at fixed page index intervals such that each smaller pdf | |
correspond to a single scanned document. In addition, you want to place them | |
place them under a specific output directory while ensuring no filename | |
collisons. |
NewerOlder