Jin Zhe jin-zhe

PDF Compression on Mac using command line

gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dNOPAUSE -dBATCH  -dQUIET -sOutputFile=output.pdf input.pdf

	'''
	DESCRIPTION:
	This simple convenience function provides parallelization of pandas .apply()
	Adapted from: https://proinsias.github.io/tips/How-to-use-multiprocessing-with-pandas/

	REQUIREMENTS:
	`multiprocess` and `dill` packages are required.
	```
	python -m pip install multiprocess dill
	```

	def load_csv(csv_path: Path, ignore_first_row=True, ignore_empty_rows=True, delimiter=','):
	'''
	Returns all the rows of a csv file
	'''
	rows = []
	with csv_path.open() as csvfile:
	csv_reader = csv.reader(csvfile, delimiter=delimiter)
	if ignore_first_row:
	next(csv_reader)
	for row in csv_reader:

	import pandas as pd


	def jsonl_to_df(jsonl_filepath):
	return pd.read_json(jsonl_filepath, lines=True)


	def df_to_jsonl(df, jsonl_filepath):
	payload = df.to_json(orient='records', lines=True)
	with open(jsonl_filepath, 'w') as writer:

	'''
	Simple script to split a PDF using PyPDF2 package in Python.
	Often times we would need to split an academic paper into the main paper and the
	supplementary material before submission.
	To do that, the script may be simply run as:

	`python split_pdf.py -in CVPR.pdf -s 15 -o`

	This produces 2 files: 'CVPR.01-14.pdf' and 'CVPR.15-20.pdf', where the starting
	page numbers for each split file are 1 and 15 respectively.

	'''
	Workaround for logging a simple table that supports step sliding. (See issue https://github.com/wandb/wandb/issues/6286)
	It's a great pity that wandb currently doesn't support this with the `wandb.Table` which is too overkill.

	The `wandb_htmltable` function follows the same signature as `wandb.Table` and takes as input parameters of the same type.
	It currently only supports text and image type data. Image data is realized via its byte string declared in the <img /> tag

	Example:
	```
	my_data = [

	'''
	Resizes images in source image directory within given size bounds (keeping
	aspect ratio) and outputs in target directory with identical directory tree
	structure. Uses Magick for image resizing.
	'''
	import os
	import argparse
	import subprocess
	from pathlib import Path

	# STEP 1: `$ mkdir ~/bin`
	# STEP 2: `$ touch ~/bin/sshfr`
	# STEP 3: `$ chmod +x ~/bin/sshfr`
	# STEP 4: Copy the following contents into `~/bin/sshfr`
	# STEP 5: Update .profile or .bash_profle: `$ export PATH=$PATH":$HOME/bin"`
	# STEP 6: Reload .profile or .bash_profle E.g. `$ . ~/.bash_profile`

	# The contents of sshfr is as follows
	ADDRESS=$1
	PORT_START=${2-49151}

	from datetime import datetime
	import os

	import pandas as pd
	import argparse

	'''
	Note:
	- Entries start on row 3 of EduRec excel exports
	- 'Student Number' column is mandatory!

	"""
	Intended usage scenario:
	You have a directory of pdfs, each comprising of sequential image scans of
	human-annotated documents (e.g. written questionaries/forms/exams) where every
	document share the same number of pages. Each pdf may contain different
	numbers of such scanned documents. You want to split all these pdfs up into
	smaller pdfs at fixed page index intervals such that each smaller pdf
	correspond to a single scanned document. In addition, you want to place them
	place them under a specific output directory while ensuring no filename
	collisons.