Skip to content

Instantly share code, notes, and snippets.

@ensargunesdogdu
Last active May 29, 2024 10:29
Show Gist options
  • Save ensargunesdogdu/283663fd0088c8b64253ae4a4dce6a0e to your computer and use it in GitHub Desktop.
Save ensargunesdogdu/283663fd0088c8b64253ae4a4dce6a0e to your computer and use it in GitHub Desktop.

Data Analytics Challenge

Dataset Information

For this challenge, we will use the Brazilian E-commerce Public Dataset by Olist from Kaggle. The dataset contains multiple tables, each with specific attributes related to e-commerce activities.

Dataset

Tables and Columns

  1. Customers Table

    • customer_id: Unique identifier for each customer
    • customer_unique_id: Unique identifier for each unique customer
    • customer_zip_code_prefix: Zip code prefix of the customer
    • customer_city: City of the customer
    • customer_state: State of the customer
  2. Geolocation Table

    • geolocation_zip_code_prefix: Zip code prefix of the location
    • geolocation_lat: Latitude coordinate of the location
    • geolocation_lng: Longitude coordinate of the location
    • geolocation_city: City of the location
    • geolocation_state: State of the location
  3. Order Items Table

    • order_id: Unique identifier for each order
    • order_item_id: Identifier for each item in the order
    • product_id: Unique identifier for each product
    • seller_id: Unique identifier for each seller
    • shipping_limit_date: Last date for shipping the item
    • price: Price of the item
    • freight_value: Freight value of the item
  4. Order Payments Table

    • order_id: Unique identifier for each order
    • payment_sequential: Sequential number of the payment
    • payment_type: Type of payment
    • payment_installments: Number of installments for payment
    • payment_value: Value of the payment
  5. Order Reviews Table

    • review_id: Unique identifier for each review
    • order_id: Unique identifier for each order
    • review_score: Score given in the review
    • review_comment_title: Title of the review comment
    • review_comment_message: Message of the review comment
    • review_creation_date: Creation date of the review
    • review_answer_timestamp: Timestamp of the review answer
  6. Orders Table

    • order_id: Unique identifier for each order
    • customer_id: Unique identifier for each customer
    • order_status: Status of the order
    • order_purchase_timestamp: Timestamp of the order purchase
    • order_approved_at: Timestamp when the order was approved
    • order_delivered_carrier_date: Date when the order was delivered to the carrier
    • order_delivered_customer_date: Date when the order was delivered to the customer
    • order_estimated_delivery_date: Estimated delivery date of the order
  7. Products Table

    • product_id: Unique identifier for each product
    • product_category_name: Name of the product category
    • product_name_lenght: Length of the product name
    • product_description_lenght: Length of the product description
    • product_photos_qty: Number of photos of the product
    • product_weight_g: Weight of the product in grams
    • product_length_cm: Length of the product in centimeters
    • product_height_cm: Height of the product in centimeters
    • product_width_cm: Width of the product in centimeters
  8. Sellers Table

    • seller_id: Unique identifier for each seller
    • seller_zip_code_prefix: Zip code prefix of the seller
    • seller_city: City of the seller
    • seller_state: State of the seller
  9. Product Category Translation Table

    • product_category_name: Name of the product category
    • product_category_name_english: Name of the product category in English

Challenge Tasks

  1. Data Exploration

    • Load the dataset and explore the structure and relationships between tables.
    • Provide a summary of each table and its attributes.
  2. Customer Analysis

    • Analyze the distribution of customers by city and state.
    • Identify the top 10 cities with the highest number of customers.
  3. Sales Analysis

    • Determine the total sales value for each product category.
    • Identify the top 10 best-selling products based on the number of items sold.
  4. Order Analysis

    • Analyze the order status distribution.
    • Calculate the average delivery time for orders.
  5. Review Analysis

    • Analyze the distribution of review scores.
    • Identify common themes in review comments using text analysis.
  6. Payment Analysis

    • Analyze the distribution of payment types.
    • Calculate the average payment value for different payment types.
  7. Geolocation Analysis

    • Visualize the geolocation data to identify regions with high order density.
    • Analyze the correlation between geolocation and delivery times.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment