How to Parse 10-K Report from EDGAR (SEC)

niravsatani24 commented Sep 30, 2023

Amazing! Thanks for sharing.

rabsher commented Dec 4, 2023

i have Html url i dont know how to get txt url of 10k file after that I am able to use above notebook code

any one can help me please

versatile712 commented Mar 19, 2024

Jesus, you saved my life!

Tarun3679 commented Mar 24, 2025

I just tried this, and it does not seem to return anything for the example above?

rabsher commented Mar 24, 2025

I just tried this, and it does not seem to return anything for the example above?

import requests
url = "https://www.sec.gov/Archives/edgar/data/1571996/000157199624000036/dell-20240202.htm" must be .htm
  headers = {
       "User-Agent": 'get it from sec website',  # by SEC website
       'Accept-Encoding': 'gzip, deflate',
       'Host': 'www.sec.gov'
   }
   response = requests.get(file_url, headers=headers)
   html_content = response.text.replace('\xa0', ' ')

you can use this code to parse a 10kfile Once you have HTML you can create your regex function to parse specific content from HTML, or you can get a complete 10k filing as text

Tarun3679 commented Mar 28, 2025

Does anyone know any such similar script to retrieve 10-Q?

john-friedman commented Apr 16, 2025

@Tarun3679
https://github.com/john-friedman/datamule-python

from datamule import Portfolio

portfolio = Portfolio('10q')
portfolio.download_submissions(submission_type='10-Q',ticker='MSFT')

for document in portfolio.document_type('10-Q'):
  document.parse()
  print(document.data)

anshoomehra/parsing10k.ipynb