Skip to content

Instantly share code, notes, and snippets.

@esafwan
Last active April 16, 2025 05:53
Show Gist options
  • Save esafwan/ed77409a6deeef3640a8b0c70e77c7d3 to your computer and use it in GitHub Desktop.
Save esafwan/ed77409a6deeef3640a8b0c70e77c7d3 to your computer and use it in GitHub Desktop.
Get and Add Redline on Docx using Pyton

You are an AI assistant tasked with analyzing and modifying a Non-Disclosure Agreement (NDA) based on specific instructions and a playbook. Your goal is to use information, code and stepds in canvas, to provide a redlined version of the NDA and a summary of your analysis and changes.

Attached docs are:

  1. The NDA to redline (NDA.docx)
  2. A playbook based on which to redline(Playbook.pdf)
  3. Instruction on how to apply redline provided in playbook in the NDA. (Instructions.pdf)

Output expected:

  1. A redlined version of the given NDA. (NDA-Redlined.docx)
  2. Summary of document content, key highlights including parties and key clauses.
  3. Summary of changes applied.
  4. Any places you are not clear or is ambiguous.

Redlined Edits in Word (.docx) Using Low-Level OpenXML Manipulation

Objective

Create a tool that can programmatically insert suggested edits (redlined insertions, deletions, and replacements) into an existing .docx Word document. These edits should appear in Microsoft Word as tracked changes, identical to how human reviewers suggest edits.

Why Low-Level OpenXML?

Most established libraries for working with .docx files do not support tracked changes (like insertions and deletions that show up as suggestions in Word). Therefore, we manipulate WordprocessingML directly (OpenXML format) to produce tracked edits:

  • <w:ins> — Inserted text (appears as underlined suggestion)
  • <w:del> — Deleted text (appears as strikethrough suggestion)
  • Metadata (author, timestamp, revision ID)

This approach guarantees compatibility with Word and maximum flexibility.


Approach

We'll unzip the .docx (a ZIP archive), modify its internal XML files (word/document.xml, etc.), and repackage it.

Step-by-Step

1. Required Libraries (Python)

  • python-docx – manipulate .docx document structure
  • lxml – low-level XML manipulation for reading/writing tracked changes
pip install python-docx lxml

2. .docx Structure

A .docx file is a ZIP archive with the following relevant files:

  • word/document.xml – main document content
  • word/comments.xml (optional) – for comments (not used for redlines)
  • _rels/.rels and [Content_Types].xml – for relationships and part types

We'll inject <w:ins> and <w:del> into document.xml.

3. Sample XML Structure for Tracked Changes

<w:p>
  <w:r>
    <w:del w:author="Editor" w:date="2025-04-13T10:00:00Z">
      <w:r><w:delText>old text</w:delText></w:r>
    </w:del>
    <w:ins w:author="Editor" w:date="2025-04-13T10:00:01Z">
      <w:r><w:t>new text</w:t></w:r>
    </w:ins>
  </w:r>
</w:p>

This creates a replacement: delete “old text” and insert “new text”.

4. Workflow

  • Load .docx using python-docx
  • Extract and parse underlying XML with lxml
  • Locate text nodes to modify
  • Replace with <w:ins> / <w:del> inline using exact run positions
  • Save document back to .docx

5. Considerations

  • Preserve namespace declarations (xmlns:w, etc.) when editing XML.
  • Assign consistent rsid, author, and timestamp to edits.
  • Always insert redlined edits in-place using parent.insert(index, element) rather than append, to avoid incorrect placement (e.g., at paragraph end).
  • Simulate replacements using a <w:del> followed immediately by a <w:ins>.
  • Track edits across paragraphs and runs carefully — Word may group them visually.

Example Functions in Python for Handling Redlined Edits

1. Extract Redline Changes (insertions/deletions)

def extract_redlines(doc):
    """
    Extracts all tracked insertions and deletions from the given Word document
    and returns them as two separate lists of dictionaries.
    """
    from lxml import etree

    ns = {'w': 'http://schemas.openxmlformats.org/wordprocessingml/2006/main'}
    insertions, deletions = [], []

    for para in doc.paragraphs:
        tree = etree.fromstring(para._element.xml.encode('utf-8'))
        for ins in tree.xpath('.//w:ins', namespaces=ns):
            ins_text = ''.join(ins.xpath('.//w:t/text()', namespaces=ns))
            insertions.append({
                'text': ins_text,
                'author': ins.get(f'{{{ns["w"]}}}author'),
                'date': ins.get(f'{{{ns["w"]}}}date')
            })
        for delete in tree.xpath('.//w:del', namespaces=ns):
            del_text = ''.join(delete.xpath('.//w:delText/text()', namespaces=ns))
            deletions.append({
                'text': del_text,
                'author': delete.get(f'{{{ns["w"]}}}author'),
                'date': delete.get(f'{{{ns["w"]}}}date')
            })
    return insertions, deletions

2. Apply a Redlined Replacement (Generic)

def apply_redline_replacement(doc, replacements, author="Reviewer"):
    """
    Accepts a list of replacement instructions and applies tracked changes.
    replacements: List of dicts with keys: 'search_text', 'replacement_text'
    """
    from docx.oxml import OxmlElement
    from docx.oxml.ns import qn
    from datetime import datetime

    def make_ins(text):
        ins = OxmlElement("w:ins")
        ins.set(qn("w:author"), author)
        ins.set(qn("w:date"), datetime.utcnow().isoformat() + "Z")
        r = OxmlElement("w:r")
        t = OxmlElement("w:t")
        t.text = text
        r.append(t)
        ins.append(r)
        return ins

    def make_del(text):
        delete = OxmlElement("w:del")
        delete.set(qn("w:author"), author)
        delete.set(qn("w:date"), datetime.utcnow().isoformat() + "Z")
        r = OxmlElement("w:r")
        del_text = OxmlElement("w:delText")
        del_text.text = text
        r.append(del_text)
        delete.append(r)
        return delete

    for rep in replacements:
        for para in doc.paragraphs:
            if rep['search_text'] in para.text:
                for run in para.runs:
                    if rep['search_text'] in run.text:
                        run.text = run.text.replace(rep['search_text'], "")
                        parent = run._element.getparent()
                        idx = parent.index(run._element)
                        parent.insert(idx + 1, make_del(rep['search_text']))
                        parent.insert(idx + 2, make_ins(rep['replacement_text']))
                        break

3. Compare Document Text With and Without Changes

def get_doc_text(doc):
    """Returns plain text content of a Word doc (ignores tracked changes)."""
    return "\n".join([p.text for p in doc.paragraphs if p.text])


def get_doc_text_with_redlines(doc):
    """Returns text with redlined (inserted/deleted) content included explicitly."""
    from lxml import etree
    ns = {'w': 'http://schemas.openxmlformats.org/wordprocessingml/2006/main'}
    lines = []
    for para in doc.paragraphs:
        xml = para._element.xml.encode('utf-8')
        tree = etree.fromstring(xml)
        text_parts = []
        for node in tree.iter():
            if node.tag.endswith('}t'):
                text_parts.append(node.text or '')
            elif node.tag.endswith('}delText'):
                text_parts.append(f"[DEL:{node.text}]")
            elif node.tag.endswith('}ins'):
                ins_text = ''.join(node.xpath('.//w:t/text()', namespaces=ns))
                text_parts.append(f"[INS:{ins_text}]")
        lines.append("".join(text_parts))
    return "\n".join(lines)

Example Usage: Apply Redlined Edits

from docx import Document

# Load the document
doc = Document("sample.docx")

# Define your tracked replacements
replacements = [
    {'search_text': "old phrase", 'replacement_text': "new phrase"},
    {'search_text': "Company Name", 'replacement_text': "Acme Inc."},
    {'search_text': "_____________", 'replacement_text': "Safwan"},
]

# Apply the redlined replacements
apply_redline_replacement(doc, replacements, author="Editor")

# Save to new file
doc.save("sample_redlined.docx")

Outcome

You will generate a .docx file with visible tracked changes (suggestions) that Word treats as reviewer edits, and they will appear in the correct inline context.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment