Created
May 25, 2017 11:17
-
-
Save georgf/8471f268556fb08ffcafda2a385d86f4 to your computer and use it in GitHub Desktop.
"main" ping size distributions on Beta
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# coding: utf-8 | |
--- | |
title: How big are the incoming "main" pings? | |
authors: | |
- georg_fritzsche | |
tags: | |
- ping size | |
- firefox | |
- main ping | |
created_at: 2017-05-25 | |
updated_at: 2017-05-25 | |
tldr: How big are the incoming "main" ping currently? Nearly all are under 400kb. | |
--- | |
# ### How big are the incoming "main" pings on Beta? | |
# In[1]: | |
import pandas as pd | |
import numpy as np | |
import matplotlib | |
import json | |
from matplotlib import pyplot as plt | |
from moztelemetry.dataset import Dataset | |
from moztelemetry import get_pings_properties, get_one_ping_per_client | |
import pylab | |
get_ipython().magic(u'pylab inline') | |
# Based on a 10% submission sample, determine what the ping sizes are. | |
# As we don't have any meta field that tracks the real ping sizes, we estimate them using the serialized JSON string length. | |
# In[2]: | |
Dataset.from_source("telemetry").schema | |
# In[ ]: | |
pings = Dataset.from_source("telemetry") .where(docType='main') .where(submissionDate=lambda x: int(x) >= 20170501 and int(x) < 20170514) .where(appUpdateChannel="beta") .records(sc, sample=0.1) | |
# In[ ]: | |
sizes = pings.map(lambda p: len(json.dumps(p))) | |
size_series = pd.Series(sizes.collect()) | |
# Show the distribution of sizes in kb. | |
# In[ ]: | |
(size_series / 1024).describe(percentiles=[0.25, 0.5, 0.75, 0.9, 0.95, 0.99, 0.999]) | |
# In[ ]: | |
size_series.hist(xrot=45) | |
# In[ ]: | |
size_series.hist(xrot=45, log=True) | |
# In[ ]: | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment