Design Pastebin, a website where you can store and share text online for a set period of time. Note: Bit.ly is a similar service, with the distinction that Pastebin requires storing the paste contents instead of the original unshortened URL.
NOTE: This is taken from Pramp, an excellet tool from Exponent. The objective here is to answer scenarios as well as possible for a System Design Interview
- Features scope
- API design
- Pseudo code for specific components
- Data model/schema
- Back-of-the-envelope calculations
- Reference links
- Link to whiteboard or diagram such as https://sketchboard.me/new
- What is the max data which can be saved ? Any limits? a) If no limits, then lets keep an average of 50KB data to be saved
- Pastebin requires inputs from how many members? 1 or more ? a) If more than 1, then the scope of the problem would be different. b) If only 1 person can write and share the text as a file, then caching the data can be different
- What is the length of the data to be stored? a) Would we store the data permanently ? b) Is there any time after which we will remove the data from the system?
- How many users are working together on the system? a) ~ 1 Million?
- How do you want to load the data for large files? a) Do you want the data to be lazy loaded into the browser? i.e. data cached into the CDN or would you like to fetch all data
- Do we want user info to be saved? I mean do we want user logins or do we want anonymous or both ?
- Is the Data critical for us? Can DB take a hit and still be fine for the user?
- Can user Update the Paste Link? a) Ideally should be NO
Each file will be of 50KB data, and the data will be a flat file For ~ 1 Million users ( 1000000 Users ), the data would be 50,000,000 KB = 50GB data being written Network Throughput for this would not matter too much though
- User Metadata ( ID, Password , any other relevant information)
- Links to URLs
- blob storage can work for this, and it would be able to provide any internal storage
- URL Shortened Link
- URL ID
- Timestamp
- RBAC
- User ID ( FK ) to share the information
We can provide 2 types of API:
- Get
- Post Keeping it simple for the users to actually tackle the information only
Blob and DBs can be region specific, so that the data can be quickly written and saved on the cache https://sketchboard.me/nC8G5i83glxm#/ is a sample for the System Design
A more detailed design, which I found out later but has some matches with my idea is in https://medium.com/codex/designing-pastebin-77e6e86172eb Ofcourse there are some things which I did not consider like Custom URL