dblalock

Faster int<->float conversion by setting the exponent bits such that ULP of float is exactly 1, and offsetting x by a power of 2 so that the int gets stuffed in the bottom of the mantissa bits:

See Sec 3.4 of "Who Says Elephants Can't Run: Bringing Large Scale MoE Models into Cloud Scale Production"
Code is basically:

half int_to_f16(int8_t i) {
    uint8_t unsigned_val = (uint8_t)(i + 128); // signed to unsigned
    uint16_t combined_bits = 0x6400 | unsigned_val; // 0x6400 is 1024 in FP16; makes ULP 1
    half combined_float = *(half*)&combined_bits; // no-op reinterpret cast
    return combined_float - (half)1152.0;  // subtract (1024 + 128) to get original value
}

Papers

Revisiting ResNets: Improved Training and Scaling Strategies

Bello, Irwan, William Fedus, Xianzhi Du, Ekin D. Cubuk, Aravind Srinivas, Tsung-Yi Lin, Jonathon Shlens, and Barret Zoph. "Revisiting resnets: Improved training and scaling strategies." arXiv preprint arXiv:2103.07579 (2021).

@article{bello2021revisiting,
  title={Revisiting resnets: Improved training and scaling strategies},
  author={Bello, Irwan and Fedus, William and Du, Xianzhi and Cubuk, Ekin D and Srinivas, Aravind and Lin, Tsung-Yi and Shlens, Jonathon and Zoph, Barret},
  journal={arXiv preprint arXiv:2103.07579},

They basically all suggest that apparent improvements to the state of the art in ML and related fields are often not real, or at least the result of factors other than what the authors claim.

The state of sparsity in deep neural networks

What is the state of neural network pruning?

On the State of the Art of Evaluation in Neural Language Models

Do Transformer Modifications Transfer Across Implementations and Applications?

Keybase proof

I hereby claim:

I am dblalock on github.
I am dblalock (https://keybase.io/dblalock) on keybase.
I have a public key ASArwQtIhdozJpppmdhAOiJHgd-DmtilH9vzvxNo_0vazQo

To claim this, I am signing this object:

	from typing import * # for convenience; also, this is valid for typing
	import warnings
	import torch
	import torch.nn as nn
	import torch.nn.functional as F
	from torch import Tensor # shorten type signatures


	# def try_compile(f: Callable):
	# def try_compile():

	import numpy as np

	np.random.seed(1234)


	class FullyConnectedLayer(object):

	def __init__(self, num_inputs, num_outputs):
	pass

	name: tf27
	channels:
	- pytorch
	- defaults
	dependencies:
	- python=2.7
	- tensorflow-gpu=1.12.*
	- keras
	- numpy
	- pandas

	#!/bin/env/python

	"""utility functions for running experiments"""


	import datetime
	import os
	import itertools
	import warnings
	import numpy as np

	/* Allocate aligned memory in a portable way.
	*
	* Memory allocated with aligned alloc MUST be freed using aligned_free.
	*
	* @param alignment The number of bytes to which memory must be aligned. This
	* value must be <= 255.
	* @param bytes The number of bytes to allocate.
	* @param zero If true, the returned memory will be zeroed. If false, the
	* contents of the returned memory are undefined.
	* @returns A pointer to `size` bytes of memory, aligned to an `alignment`-byte

	--E.g., if you have a message selected that has subject and body:
	-- Subj: Meeting about the Foo!
	-- Body: whatever
	--Running this script copies to the clipboard the string:
	-- [Meeting about the Foo!](message://<some message id>)
	--If you click on the resulting url, the message opens in Mail
	--
	--Basically, this is for getting links to emails into a todo list / task manager

	--based on https://www.macstories.net/tutorials/send-selected-mail-message-to-evernote-with-source-url/