Claim-Source Mapping in Synthesis
@dataclass
class SourcedClaim:
"""A factual claim with its attribution maintained through synthesis."""
claim: str
source_id: str
source_name: str
source_type: str # "peer_reviewed" | "industry_report" | "news" | "blog"
publication_date: str
confidence: str # based on source quality + claim type
conflicting_sources: list['SourcedClaim'] = None # if conflict exists
@dataclass
class SynthesisResult:
summary: str
claims: list[SourcedClaim]
conflicts: list[dict]
def format_with_citations(self) -> str:
"""Format the synthesis with inline citations."""
result = self.summary + "\n\n"
if self.conflicts:
result += "## Conflicting Data Points\n"
for conflict in self.conflicts:
result += (
f"- **{conflict['topic']}**: "
f"{conflict['source_a']['name']} reports {conflict['source_a']['value']}, "
f"while {conflict['source_b']['name']} reports {conflict['source_b']['value']}. "
f"Discrepancy may reflect {conflict['possible_explanation']}.\n"
)
result += "\n## Sources\n"
for i, source in enumerate(self.get_unique_sources()):
result += f"[{i+1}] {source['name']} ({source['date']}). {source['type']}.\n"
return result
Handling Conflicting Sources
async def synthesize_with_attribution(sources: list[dict]) -> SynthesisResult:
"""
Synthesis prompt that maintains source attribution and surfaces conflicts.
"""
synthesis_prompt = f"""
Synthesize a research report from these {len(sources)} sources.
SOURCES:
{format_sources_with_ids(sources)}
INSTRUCTIONS FOR ATTRIBUTION:
1. For every factual claim, include the source ID in brackets: [SOURCE_1]
2. If sources AGREE on a fact: cite all agreeing sources
3. If sources DISAGREE on a fact: present BOTH values and BOTH sources, note the conflict
4. NEVER choose between conflicting sources — present the conflict
5. Include publication date for all data points that could change over time
6. Rate source quality: peer_reviewed > industry_report > news_article > blog
OUTPUT FORMAT:
- Main synthesis with inline citations [SOURCE_X]
- Conflicts section: explicitly list each data point where sources disagree
- Data gaps: topics where no source had information
- Source quality notes: flag if claiming peer-reviewed status
EXAMPLE FORMAT:
Enterprise AI adoption rates vary significantly by survey methodology:
34% of enterprises have deployed AI in production [SOURCE_2, Gartner Q3 2023],
though other estimates reach 67% when including pilots and POCs [SOURCE_4, IDC 2024].
The discrepancy likely reflects different definitions of "deployed."
"""
response = await call_claude(synthesis_prompt)
return parse_synthesis_response(response, sources)
Source Quality Weighting
SOURCE_QUALITY_WEIGHTS = {
"peer_reviewed": 1.0, # Highest — peer review, methodology section
"industry_report": 0.8, # Good — named methodology, large sample
"government_data": 0.9, # Very good — official statistics
"news_article": 0.5, # Moderate — journalist interpretation
"company_blog": 0.3, # Lower — marketing bias possible
"social_media": 0.1, # Lowest — no editorial review
}
def calculate_claim_confidence(
claim: SourcedClaim,
source_type: str
) -> str:
weight = SOURCE_QUALITY_WEIGHTS.get(source_type, 0.5)
if weight >= 0.8:
return "high"
elif weight >= 0.5:
return "medium"
return "low"
Key Takeaways
- Every claim needs attribution — not “research shows” but “Smith et al. 2023 shows”
- Conflicts must be surfaced — present both values and both sources
- Temporal data needs dates — a 2019 statistic presented as current is misinformation
- Source quality affects confidence — peer-reviewed ≠ blog post
- Attribution survives synthesis — doesn’t disappear when you combine sources