Skip to content

Instantly share code, notes, and snippets.

View armancohan's full-sized avatar

Arman Cohan armancohan

View GitHub Profile
from transformers import (
AutoTokenizer,
LEDConfig,
LEDForConditionalGeneration,
)
from datasets import load_dataset
import random
random.seed(2)
```
g batch of size torch.Size([2407, 2]) because not full seq_len of 16384
----------------------------------------------------------------------------------------------------
| Eval 354 at step 88500 | time: 1345.55s | valid loss 0.74 | bpc 1.07357
----------------------------------------------------------------------------------------------------
| epoch 130 step 88510 | 16 batches | lr 0.000442 | ms/batch 11917.46 | loss 0.75 | bpc 1.07888
| epoch 130 step 88520 | 26 batches | lr 0.000442 | ms/batch 5110.18 | loss 0.78 | bpc 1.12858
| epoch 130 step 88530 | 36 batches | lr 0.000442 | ms/batch 5107.78 | loss 0.71 | bpc 1.02528
| epoch 130 step 88540 | 46 batches | lr 0.000442 | ms/batch 5109.07 | loss 0.74 | bpc 1.07031
| epoch 130 step 88550 | 56 batches | lr 0.000442 | ms/batch 5111.60 | loss 0.78 | bpc 1.12227
@armancohan
armancohan / scripts.py
Last active December 4, 2019 02:46
Return the structure of a possibly nested dictionary object without all the values
import typing
def return_dict_structure(obj):
"""Return the structure of a possibly nested dictionary object."""
new_obj = {}
if isinstance(obj, typing.List) or isinstance(obj, typing.Tuple):
if obj:
return [return_dict_structure(obj[0]), '...']
else:
@armancohan
armancohan / commands.md
Last active October 5, 2018 17:24
Useful Commands

Things that are often needed and faster to find here than Google.

Create a directory and exist ok:

import pathlib
# create parent of a file
pathlib.Path('file_path').parent.mkdir(parents=True, exist_ok=True)
# create directory
pathlib.Path('directory_path').mkdir(parents=True, exist_ok=True)
@armancohan
armancohan / format_block.py
Last active July 13, 2017 00:37
A simple script to change the format of the text so all lines would fit within a limited length (for python long comments to conform to pep8).
#!/usr/bin/python
import sys
def format_block(txt, maxlen=79, indent=2):
lines = txt.replace('\n',' ')
res = []
line = ''
tokens = [e for e in lines.split(' ') if e]
i=0
lineno=0
@armancohan
armancohan / ipython_start.py
Last active February 13, 2018 13:41
Example ipython startup script
# add this to ~/.ipython/profile_default/startup/start.py
import os
import sys
import re
from collections import Counter, defaultdict, namedtuple
import itertools
import json
import numpy as np
import gzip
@armancohan
armancohan / compress.md
Last active April 10, 2024 12:09
compress large directory with tar, show progress

You need to have pv installed.

Command:

tar -cf - [Source directory] -P | pv -s $(du -sb [Source dir] | awk '{print $1}') | gzip > [Dest tar.gz file]

Example:

tar -cf - dir/ -P | pv -s $(du -sb dir/ | awk '{print $1}') | gzip > file.tar.gz
@armancohan
armancohan / bash-completion.md
Created June 12, 2017 19:47
Bash completion on mac OS X with homebrew

Bash Completion on OS X With Brew

Install bash-completion with brew:

brew install bash-completion

Add to .bash_profile

if [ -f $(brew --prefix)/etc/bash_completion ]; then
@armancohan
armancohan / logging_example.py
Last active February 13, 2018 17:16
Simple verbose logging setup example
from argparse import ArgumentParser
ap = ArgumentParser()
ap.add_argument('--verbose', '-v', action='store_true', default=False)
args = ap.parse_args()
imoprt logging
log = logging.getLogger(__name__)
if args.verbose:
log.basicConfig(format="%(levelname)s: %(message)s", level=log.DEBUG)
@armancohan
armancohan / fibonacci.py
Created June 6, 2017 20:36
Calculate the n-th Fibonacci number using tensorflow's FIFOQueue data structure
import sys
import tensorflow as tf
def main(n):
""" Calculate the n-th Fibonacci number """
if n < 1:
print('n should be greater than 0')
sys.exit(1)
elif n == 1 or n == 2: