Extracting relationship intelligence from large message datasets

I’m looking to process large volumes of message data (150,000+ messages) to extract structured insights about people and relationships. The goal is to turn messy, unstructured conversations into clean, analyzable data that can power features like:

Contact and relationship extraction—Identify people mentioned in messages, extract contact details, and classify relationship types.
Relationship strength mapping - Categorize connections using a simple 2x2 framework or numeric scale (e.g., weak/strong ties, intimacy scores).
Interest/topic detection - Surface shared interests and recurring conversation themes.
LLM-ready formatting - Output the data in a structured format optimized for downstream use by large language models.

Ideally, this would include a middleware layer that handles chunking, structuring, and metadata enrichment before sending anything to the LLM, rather than just dumping raw text. That approach would enable scalability (e.g., MapReduce-style processing), maintain context fidelity, and avoid the need for expensive LLM training.

This capability would be beneficial for anyone building tools in the relationship management, personal CRM, or social intelligence space, especially when working with high-volume, unstructured message data.

If others are tackling something similar, I would love to hear how you approach it.

Storytell

Extracting relationship intelligence from large message datasets

Subscribe to post

Subscribe to post