Skip to content

Instantly share code, notes, and snippets.

@tejeshreddy
Last active February 5, 2024 19:17
Show Gist options
  • Save tejeshreddy/04012ad1f40675b53ce2b05d498dca65 to your computer and use it in GitHub Desktop.
Save tejeshreddy/04012ad1f40675b53ce2b05d498dca65 to your computer and use it in GitHub Desktop.
Video Classification Service System Design

System Architecture

System Architecture

End Points

Video Post Request (/api/upload)

Methods Allowed: POST

POST Method

Sample Request:

curl -X POST \
  -H "Content-Type: multipart/form-data" \
  -F "file=@/path/video.mp4" \
  -F "[email protected]" \
  -F "name=John Doe" \
  http://your-server-endpoint/api/upload/

Sample Response:

{
  "status": {
    "code": 200,
    "message": "success"
  },
  "id": "7c3f89e0-3f5e-4b3a-b67d-7a7938502c36"
}

Other Response Status:

{"code": 200, "message": "success"}
{"code": 400, "message": "bad request"}
{"code": 500, "message": "internal server error"}

Classification (/api/classification/<id>)

Methods Allowed: GET, PUT (idempotent)

GET Method

Sample Request:

curl -X GET \
  http://server-endpoint/api/classification/7c3f89e0-3f5e-4b3a-b67d-7a7938502c36

Sample Response:

{
  "status": {
    "code": 200,
    "message": "success"
  },
  "data": {
    "name": "John Doe",
    "video_file_url": "https://s3-bucket-url/video.mp4",
    "image_file_url": "https://s3-bucket-url/image.jpg",
    "status": "processing | completed | error"
  }
  "id": "7c3f89e0-3f5e-4b3a-b67d-7a7938502c36",
}

PUT Method

Sample Request:

curl -X PUT \
  -H "Content-Type: application/json" \
  -d '{"name": "Updated Name", "email": "[email protected]"}' \
  http://server-endpoint/api/classification/7c3f89e0-3f5e-4b3a-b67d-7a7938502c36

Sample Response:

{
  "status": {
    "code": 200,
    "message": "success"
  },
  "data": {
    "name": "Updated Name",
    "email": "[email protected]",
    "status": "notified | notified to new email"
  },
  "id": "7c3f89e0-3f5e-4b3a-b67d-7a7938502c36"
}

Note: PUT command first validtes if the notification is not sent prior updating the email, name or both.

Data Estimations

Video Uploads

  • Endpoint: /api/upload
  • Storage Location: AWS S3
    • Assumptions:
      • Video Uploads per day: 1000
      • Average Video Size: 10 MB
      • Daily Data Ingestion: ~10 GB
      • Monthly Data Ingestion: ~300 GB

Classification Results Storage

  • Storage Location: AWS S3 (Classified Frames)
    • Assumptions:
      • Classification Frames per day: 1000
      • Average Image Size: ~1MB
      • Monthly Data Ingestion: ~30GB
  • Storage Location: AWS Dynamo DB (Metadata)
    • Assumptions:
      • Metadata Entries per month: 30 * 1000

Components

1. User Input

  • Video files are uploaded via REST API or Web UI hosted behind the API Gateway and a Load Balancer.

2. REST Server

  • Flask application running on WSGI (production grade flask build) & NGINX (reverse proxy).
  • Validates video files for length, format, and file type.
  • Posts an event to the input-queue to be picked up by the classification service.
  • Uploads video files to blob storage(s3) for storage.

3. Autoscaling Group

  • Scales up and down instances based on the current size of the queue.
  • CloudWatch triggers on a certain ratio of (unprocessed-event) / (current-running-instances) above a certain threshold to meet the increasing demand.
  • A minimum and maximum of 1 and 20 instances are configured.

4. AMI (Machine Image)

  • Pre-configured machine image from an EC2 instance.
  • Configured with the Autoscaling Group.
  • Uses systemd to execute a set of commands to start instance and reboot during instance termination.

5. ML Processing

  • Extracts events from the queue.
  • Picks files from S3 to store and compares vectorization of faces using FFmpeg using an already existing algorithm.
  • Uploads classified frames to S3, output events to SQS, and metadata to DynamoDB.

6. Results

  • Results can be accessed using the classification/ endpoint once the results are processed and present for retireval.
  • This endpoint is primarly used for WebUI where the application constantly polls for results.

7. Notification Service

  • The notification server processes the queue event and checks with the DynamoDB for the email associated with the request to send either a success with the necessary output or a failure message.
  • Once the notification is sent, the queue event is deleted; otherwise, the event is dropped to the Dead Letter Queue (DLQ).
  • Depending on the usecase a web hook can be used to connect with 3rd party applications.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment