This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
DROP VIEW IF EXISTS app_uuid_view; | |
CREATE VIEW app_uuid_view AS | |
SELECT | |
CASE WHEN user_agent LIKE('%iPhone%') THEN 'iOS' | |
ELSE 'Android' END AS platform, | |
parse_url(concat('http://bla.org/woo/', uri_query), 'QUERY', 'appInstallID') AS uuid | |
FROM wmf_raw.webrequest | |
WHERE uri_query LIKE('%sections=0%') | |
AND uri_query LIKE('%action=mobileview%') | |
AND uri_query LIKE('%appInstallID%') |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
//Example: this is an existing test | |
public void testIsPageviewApp() { | |
Text uriHost = new Text("en.wikipedia.org"); | |
Text uriPath = new Text("/w/api.php?action=mobileview§ions=0"); | |
Text httpStatus = new Text("200"); | |
Text contentType = new Text("application/json"); | |
Text userAgent = new Text("WikipediaApp/1.2.3"); | |
IsPageviewUDF udf = new IsPageviewUDF(); | |
assertTrue(udf.evaluate(uriHost, uriPath, httpStatus, contentType, userAgent).get()); | |
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
SET hive.exec.compress.output=true; | |
SET whitelisted_mediawiki_projects = 'commons', 'meta', 'incubator', 'species'; | |
CREATE TABLE ironholds.pageviews_sample_test(qualifier STRING, count_views INT); | |
INSERT OVERWRITE TABLE ironholds.pageviews_sample_test | |
SELECT | |
CONCAT(sub1.language_and_site, sub1.project_suffix) qualifier, | |
COUNT(*) count_views | |
FROM ( | |
SELECT | |
regexp_extract(uri_host, '^([A-Za-z0-9-]+(\\.(zero|m))?)\\.[a-z]*\\.org$') language_and_site, |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(anonymise) | |
library(digest) | |
library(microbenchmark) | |
#Generate some unique character strings. Say, 30,000 of them. | |
uniques <- character(30000) | |
for(i in seq_along(uniques)){ | |
uniques[i] <- paste(sample(c(0:9,letters,LETTERS), 30), collapse = "") | |
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
> library(WMUtils) | |
Loading required package: jsonlite | |
Attaching package: ‘jsonlite’ | |
The following object is masked from ‘package:utils’: | |
View | |
Loading required package: RMySQL |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
org.apache.thrift.TApplicationException: Internal error processing FetchResults | |
at org.apache.thrift.TApplicationException.read(TApplicationException.java:108) | |
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71) | |
at org.apache.hive.service.cli.thrift.TCLIService$Client.recv_FetchResults(TCLIService.java:505) | |
at org.apache.hive.service.cli.thrift.TCLIService$Client.FetchResults(TCLIService.java:492) | |
at org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:311) | |
at info.urbanek.Rpackage.RJDBC.JDBCResultPull.fetch(JDBCResultPull.java:70) | |
Error in .jcall(rp, "I", "fetch", stride) : | |
java.sql.SQLException: Error retrieving next row |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
country yoy_pattern most_recent | |
1: AD -23.595506 409000 | |
2: AE -13.545211 33170000 | |
3: AF -2.479339 1539000 | |
4: AG -20.183486 431000 | |
5: AI -40.350877 42000 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ips <- mysql_query("SELECT DISTINCT(cuc_ip) FROM cu_changes WHERE cuc_ip IS NOT NULL LIMIT 10000;","enwiki")$cuc_ip | |
Unit: milliseconds | |
expr min lq mean median uq max neval | |
{ test <- c_geo_city(ips) } 50.61829 57.15351 77.02546 60.63893 65.2781 319.1384 100 | |
Unit: seconds | |
expr min lq mean median uq max neval | |
{ test <- geo_city(ips) } 1.499703 1.78602 1.935753 1.923845 2.067342 2.564509 100 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Unit: milliseconds | |
expr min lq mean median uq max neval | |
{ test <- c_geo_country(ips) } 56.83126 61.43066 64.75143 63.72016 66.15943 136.7492 100 | |
Unit: seconds | |
expr min lq mean median uq max neval | |
{ test <- geo_country(ips) } 5.597797 5.814648 6.260509 5.900368 6.118264 10.61174 100 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
> time = "1250-02-46" | |
> strptime(time, "%Y-%m-%j") | |
[1] "1250-02-15" |
NewerOlder