Back to Terminal
BRIGHT DATA
Architecture
evidence-firsttraceableBright Data docs
ARCHITECTURE
How the live product actually runs
SSE streaming + DB persistence at every stage
Keyword StrategySocial + Web SearchEvidence BuildPrioritize PostsAI Post AnalysisBuild CategoriesExecutive BriefPersist + Replay
User question
User asks what people think about a company/topic.
Layer-1 keyword planner (AI)
Generates focused keywords (default 7) used as search seeds.
Social collection (Bright Data)
Reddit keyword dataset + SERP top-up for Reddit/X/LinkedIn mentions.
Unstructured discovery
SERP expands beyond social into blogs, reviews, and forums.
Evidence build
Normalizes links into evidence items. Optional markdown extraction runs with strict budget/timeouts.
Prioritization + ranking
Deterministic scoring by recency, relevance, engagement, and text richness.
AI post analysis (batched)
Classifies each post sentiment/category, writes takeaways, risk/opportunity tags, and confidence.
Narratives + executive brief
Synthesizes categories and final brief with actions and coverage/confidence notes.
Persistence + replay
Session events and result artifacts are stored for session history and reload.
Terminal UI
Streams progress/events live, then renders explorer cards, narrative map, and trace/replay.
filter official self-posts on X/LinkedInprovider: OpenAI / OpenRouter / NVIDIA (NIM)session-first storage model
Architecture Notes
What the current implementation does in production
COLLECTION LAYER
Bright Data is the acquisition layer: Reddit keyword dataset + SERP top-up for Reddit/X/LinkedIn + unstructured web discovery. X/LinkedIn likely official-account self-posts are filtered out.
ANALYSIS LAYER
After collection, evidence is normalized, posts are ranked, then AI batch-classifies posts (sentiment/category/takeaway/tags). Narratives and the executive brief are synthesized from this.
DELIVERY + REPLAY LAYER
SSE streams stage progress and artifacts in real time. Sessions/events/results are persisted so users can reopen past runs without rerunning.