Skip to content

Instantly share code, notes, and snippets.

@bsiegfreid
Created April 30, 2022 14:57
Show Gist options
  • Save bsiegfreid/dc905d10ff9460424319b60565781956 to your computer and use it in GitHub Desktop.
Save bsiegfreid/dc905d10ff9460424319b60565781956 to your computer and use it in GitHub Desktop.
Use Python to parse records with header delimiters, key/value pairs, and sub records, as a list of dictionaries.
t = """
---- Header -----
value_a: a
value_b: a
value_c: a
value_d: a
-- Record 1 --
value_a: a
value_b: a
value_c: a
value_d: a
value_d:
a_a: 1.0
a_b: 2.0
a_c: 3.0
-- Record 2 --
value_a: a
value_b: a
value_c: a
value_d: a
value_d:
a_a: 1.0
a_b: 2.0
a_c: 3.0
-- Record 3 --
value_a: a
value_b: a
value_c: a
value_d: a
value_d:
a_a: 1.0
a_b: 2.0
a_c: 3.0
"""
def parse_records(text, record_delimiter="--", key_delimiter=":", sub=" "):
# convert text to lines, removing empty lines
lines = (line for line in text.splitlines() if line)
records = []
record = {}
subrecord = {}
for line in lines:
if line.startswith(record_delimiter):
record = {}
records.append(record)
else:
t = line.split(key_delimiter)
k = t[0].strip()
v = t[1].strip()
if line.startswith(sub):
subrecord[k] = v
else:
if not v:
# subrecord
subrecord = record[k] = {}
else:
record[k] = v
print(f"records: {len(records)}")
for r in records:
print(r)
if __name__ == "__main__":
parse_records(t)
@bsiegfreid
Copy link
Author

@jordan-hamilton whenever I do something like this in Python I feel like there is a magic one-liner out there somewhere that can do the job in a more Pythonic way.

@jordan-hamilton
Copy link

Maybe some customization of configparser? I haven’t looked too closely, but I’m doubtful that there’s a better way to handle the sub-records.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment