NOTE: This is a question I found on StackOverflow which I’ve archived here, because the answer is so effing phenomenal.
If you are not into long explanations, see [Paolo Bergantino’s answer][2].
| """ | |
| SmoothQuant implementation. See: https://arxiv.org/pdf/2211.10438.pdf | |
| Some details are model-specific, so the code may need tweaking. | |
| """ | |
| import functools | |
| import torch | |
| from torch import nn, Tensor | |
| from typing import Dict, Iterable, Tuple | |
| device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
| def convert_tb_data(root_dir, sort_by=None): | |
| """Convert local TensorBoard data into Pandas DataFrame. | |
| Function takes the root directory path and recursively parses | |
| all events data. | |
| If the `sort_by` value is provided then it will use that column | |
| to sort values; typically `wall_time` or `step`. | |
| *Note* that the whole data is converted into a DataFrame. | |
| Depending on the data size this might take a while. If it takes |
| # encoding: utf-8 | |
| import bokeh.models as bkm | |
| import bokeh.core as bkc | |
| from bokeh.util.compiler import JavaScript | |
| class AudioPlayerModel(bkm.layouts.Column): | |
| """ | |
| Audio player using https://howlerjs.com/. |
NOTE: This is a question I found on StackOverflow which I’ve archived here, because the answer is so effing phenomenal.
If you are not into long explanations, see [Paolo Bergantino’s answer][2].
| Latency Comparison Numbers (~2012) | |
| ---------------------------------- | |
| L1 cache reference 0.5 ns | |
| Branch mispredict 5 ns | |
| L2 cache reference 7 ns 14x L1 cache | |
| Mutex lock/unlock 25 ns | |
| Main memory reference 100 ns 20x L2 cache, 200x L1 cache | |
| Compress 1K bytes with Zippy 3,000 ns 3 us | |
| Send 1K bytes over 1 Gbps network 10,000 ns 10 us | |
| Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD |