César Reyes creyesp

Data Engineering Challenge - Integración de datos con la API de NYTimes

Objetivo

El objetivo de este desafío es construir un pequeño pipeline de datos automatizado que extraiga información desde una fuente externa (la API del New York Times), la almacene en una base de datos analítica (BigQuery) y permita consultarla de manera eficiente.

¿Qué vas a hacer?

Vas a desarrollar un script en Python que se conecte a la API de noticias del NYTimes y extraiga artículos recientes según ciertos parámetros. Esa información debe ser almacenada en una tabla en Google BigQuery para su posterior análisis.

def unify_victoria_secret(df):
    """
    We want that all brands that are related to Victoria's Secret
    have `victoria's secret` as their brand instead of what they
    currently have.
    """
    df = df.copy()
    new_string = "victoria's secret"
 df.loc[df["brand_name"].isin(["Victorias-Secret", "Victoria's Secret", "Victoria's Secret Pink"]), "brand_name"] = new_string

install ProjectEnv plugin
install direnv
create a .env file and configure the env variables
create a .envrc file ans write on it dotenv
configure in pycharm > setting > build ... > ProjectEnv > add files (.env)

The below table shows some examples of heuristic benchmarks to compare the performance of a machine learning model when no previous solution exists. The original version of the table can be found in the Machine Learning Design Patterns Book (pattern 28)

| Scenario | Heuristic benchmark | Example task | Implementation for example task

	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	model_name = "tr3cks/3LabelsSentimentAnalysisSpanish"
	tokenizer_sent_esp = AutoTokenizer.from_pretrained(model_name)
	model_sent_esp = AutoModelForSequenceClassification.from_pretrained(model_name)

	# The output is ['ja', '##ja', '##ja', 'que', 'risa', 'me', 'da']
	tokenizer_sent_esp.tokenize('jajaja que risa me da')

César Reyes creyesp

Data Engineering Challenge - Integración de datos con la API de NYTimes

Objetivo

¿Qué vas a hacer?

Blogs