AI Neo4j & VectorDB 스키마 · ERD · 쿼리 가이드

Mind-Log v5.0 | 최종 업데이트: 2026-04-17 11:00

1. 듀얼 데이터 스토어 아키텍처 개요

Mind-Log v5.0은 4-Database Stack을 채택하여 각 데이터의 특성에 최적화된 저장소를 사용합니다.

데이터베이스	용도	담당 에이전트	핵심 역할
Neo4j	그래프 DB	Reasoning, Knowledge, Visualization	감정 관계 그래프, 마인드 온톨로지, 지식 그래프
Pinecone	Vector DB	Memory, Knowledge, Episode Memory	의미론적 유사도 검색, 임베딩 저장
MySQL	관계형 DB — AI 서버 직접 접근 없음, `src/api/backend_client.py` 경유	Backend 서버 (AI 서버는 간접 호출)	메시지 로그, 세션, 메타데이터, 사용자 프로필
Redis	캐시	Intent Classifier, Safety	실시간 세션 캐시, 응답 속도 최적화

왜 Neo4j + VectorDB 듀얼 스토어인가?

Vector DB만으로는 "왜 그런 감정이 생겼는지"(인과관계)를 파악할 수 없고, Graph DB만으로는 "비슷한 경험을 한 사람의 이야기"(의미론적 유사도)를 찾을 수 없습니다. Mind-Log는 두 DB를 결합하여 GraphRAG 패턴을 구현합니다.

[사용자 입력] "팀원들이 변화를 거부해요"
       │
       ├──→ VectorDB: "변화 저항" 관련 과거 기록/전문 지식 검색 (유사도 기반)
       │
       └──→ Neo4j:    "변화 저항" → LEADS_TO → "공감 능력 강화" (인과 관계)
       │
       └──→ 결과: 단순 기록이 아닌 "근본 원인 + 해결책" 연결

2. Neo4j 그래프 스키마 설계

2.1 노드 타입 정의

Mind-Log의 Neo4j 그래프는 6가지 핵심 노드 타입으로 구성됩니다.

노드 타입	설명	주요 속성	사용 에이전트
Emotion	감정 노드 (GoT 출력)	name, intensity, duration, trigger, confidence	Reasoning (GoT)
Technique	치료/코칭 기법	name, domain, description, evidence_level	Knowledge
Condition	심리 상태/조건	name, category, severity	Knowledge
Concept	리더십/심리학 개념	name, definition, domain	Knowledge
Memory	사용자 기억 노드	id, content, created_at, importance	Memory
Person	사용자 관계 인물	name, role, relationship_to_user	Memory

추가 노드 타입 (마인드 온톨로지 전용):

노드 타입	설명	주요 속성	사용 레이어
LeadershipConcept	리더십 개념	name, description, framework	조직 레이어
PsychologicalPattern	심리 패턴	name, type, frequency	개인 레이어
UserStory	익명 사용자 이야기	id, themes, anonymized_at	팀 레이어
ExpertResource	전문 자료	title, source, evidence_level	조직 레이어

2.2 관계 타입 정의

현재 구현 (코드 기준):

감정 노드 간 관계는 LEADS_TO 단일 타입만 사용합니다. src/agents/podcast/podcast_reasoning.py:701에서 다음 Cypher를 직접 발행합니다.

MERGE (a)-[:LEADS_TO {relationship: $rel}]->(b)

relationship 프로퍼티(문자열)로 의미(amplifies, triggers, conflict_with, exacerbates 등)를 구분합니다. 별도 관계 타입 enum을 사용하지 않습니다.

관계	시작 노드	끝 노드	속성	의미
LEADS_TO	Emotion	Emotion	relationship (str), strength	감정 흐름 단일 관계 — `podcast_reasoning.py:701` MERGE
TREATS	Technique	Condition	effectiveness, evidence_strength	기법이 조건을 치료
RELATED_TO	Technique	Technique	similarity_score	기법 간 유사 관계
INCLUDES	Technique	Concept	—	기법이 개념을 포함
MENTIONS	Memory	Person	context	기억이 인물을 언급
ASSOCIATED_WITH	Memory	Emotion	intensity	기억에 연관된 감정
HAS_ROOT_CAUSE	PsychologicalPattern	Concept	confidence	심리 패턴의 근본 원인
RELATED_TO_CONCEPT	UserStory	LeadershipConcept	relevance	사용자 이야기 → 리더십 개념
SIMILAR_TO_PATTERN	PsychologicalPattern	PsychologicalPattern	similarity_score	유사 심리 패턴 연결
SUPPORTED_BY_RESEARCH	LeadershipConcept	ExpertResource	evidence_level	리더십 개념의 연구 근거

2.3 Neo4j ERD (Entity-Relationship Diagram)

                  ┌──────────────────────────────────────────────┐
                  │         Mind-Log Neo4j ERD (v5.0)            │
                  └──────────────────────────────────────────────┘

┌─────────┐  LEADS_TO {relationship: "amplifies"}   ┌─────────┐
│ Emotion │────────────────────────────────────────►│ Emotion │
│         │  LEADS_TO {relationship: "triggers"}    │         │
│ name    │  LEADS_TO {relationship: "conflict_with"}│ name   │
│intensity│  LEADS_TO {relationship: "exacerbates"} │intensity│
│duration │  (단일 타입, relationship 프로퍼티로 구분) │trigger  │
└────┬────┘                                          └─────────┘
     │ ASSOCIATED_WITH
     ▼
┌─────────┐  MENTIONS   ┌─────────┐
│ Memory  │────────────►│ Person  │
│         │             │         │
│ id      │             │ name    │
│ content │             │ role    │
│importance│            └─────────┘
└─────────┘

┌───────────┐  TREATS    ┌───────────┐
│ Technique │───────────►│ Condition │
│           │            │           │
│ name      │  INCLUDES  │ name      │
│ domain    │─────┐      │ category  │
│ evidence  │     │      └───────────┘
└─────┬─────┘     ▼
      │      ┌─────────┐
RELATED_TO   │ Concept │
      │      │         │
      ▼      │ name    │
┌───────────┐│definition│
│ Technique ││         │
└───────────┘└────┬────┘
                  │ HAS_ROOT_CAUSE
                  ▼
┌────────────────────┐  SIMILAR_TO_PATTERN  ┌────────────────────┐
│PsychologicalPattern│◄────────────────────►│PsychologicalPattern│
│ name               │                      │ name               │
│ type               │                      └────────────────────┘
└────────┬───────────┘
         │
┌────────┴───────────┐ RELATED_TO_CONCEPT  ┌──────────────────┐
│    UserStory       │────────────────────►│LeadershipConcept │
│ id, themes         │                     │ name, framework  │
└────────────────────┘                     └────────┬─────────┘
                                                    │ SUPPORTED_BY_RESEARCH
                                                    ▼
                                           ┌──────────────────┐
                                           │ ExpertResource   │
                                           │ title, source    │
                                           └──────────────────┘

GoT → Neo4j 자동 연동 메커니즘

핵심 원칙: 노드를 미리 하나하나 만들어놓는 것이 아닙니다. 미리 정의하는 것은 **"틀"(스키마)**뿐이고, 구체적인 감정 노드와 관계는 LLM(Reasoning Agent)이 사용자 입력을 분석하여 매 세션마다 자동으로 생성합니다.

① 미리 정의하는 것 vs 자동 생성되는 것

구분	미리 정의 (시스템 설계 시)	자동 생성 (매 세션 실행 시)
노드	라벨(타입)만 정의: `:Emotion`, `:Technique`, `:Condition` 등	구체적 인스턴스: `{name: '고립감', intensity: 0.8}` — LLM이 매번 새로 생성
관계	관계 타입 `LEADS_TO` 정의, `relationship` 프로퍼티 값 목록 정의	구체적 연결: `고립감 -[:LEADS_TO {relationship: "amplifies"}]-> 책임감`
속성값	속성 필드명만 정의: `intensity`, `confidence`, `strength` 등	구체적 수치: `intensity: 0.8`, `strength: 0.7` — LLM이 문맥 기반으로 자동 산출

② 전체 파이프라인: 사용자 입력 → Neo4j 저장까지

[1] 사용자 입력 (그린룸 입장권)
    │  상황: "팀 회의에서 내 의견이 무시됨"
    │  생각: "리더로서 존중받지 못하고 있다"
    ▼
[2] Intent Classifier (TIER 0)
    │  complexity_score = 0.72 → GoT 활성화 결정
    ▼
[3] Reasoning Agent (TIER 1) — LLM이 GoT 수행
    │  LLM에게 전달하는 프롬프트:
    │  "사용자의 감정을 분석하여 다음 JSON Schema에 맞게
    │   감정 노드와 관계를 생성하세요."
    │
    │  LLM Structured Output (JSON Schema):
    │  {
    │    "emotion_nodes": [
    │      {"label": "고립감", ...},
    │      {"label": "책임감", ...},
    │      {"label": "자기의심", ...}
    │    ],
    │    "edges": [
    │      {"from": "자기의심",
    │       "to": "고립감",
    │       "relation": "triggers"},
    │      ...
    │    ]
    │  }
    │  ★ LLM이 이 구조를 자동으로 채워서 반환!
    ▼
[4] GoT → Cypher 직접 발행 (podcast_reasoning.py:701)
    │  별도 graph_transformer 없음
    │  MERGE (a:Emotion {name: '고립감', user_id: $user_id})
    │  MERGE (b:Emotion {name: '책임감', user_id: $user_id})
    │  MERGE (a)-[:LEADS_TO {relationship: 'amplifies', strength: 0.7}]->(b)
    ▼
[5] Neo4j 저장
    │  그래프가 누적되어 사용자의 감정 변화 추적 가능
    ▼
[6] 다음 세션에서 활용
       과거 그래프 조회 → 감정 패턴 변화 감지
       "지난달에는 고립감 0.8 → 이번 달 0.5로 개선"

③ LLM에게 전달하는 Structured Output Schema

Reasoning Agent가 LLM에게 전달하는 JSON Schema입니다. LLM은 이 틀 안에서 자유롭게 감정과 관계를 생성합니다.

GOT_OUTPUT_SCHEMA = {
    "type": "object",
    "properties": {
        "core_pattern": {
            "type": "string",
            "description": "감정 구조의 핵심 패턴명 (예: Authority Paradox)"
        },
        "emotion_nodes": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "id": {"type": "string"},
                    "label": {"type": "string", "description": "감정 이름 (한국어)"},
                    "intensity": {"type": "number", "minimum": 0, "maximum": 1},
                    "duration": {"type": "string", "enum": ["단기", "중기", "장기"]},
                    "trigger": {"type": "string", "description": "감정의 촉발 요인"},
                    "confidence": {"type": "number", "minimum": 0, "maximum": 1}
                },
                "required": ["id", "label", "intensity", "confidence"]
            }
        },
        "edges": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "from": {"type": "string"},
                    "to": {"type": "string"},
                    "relation": {
                        "type": "string",
                        "description": "관계 의미 (amplifies / triggers / conflict_with / exacerbates 등) — LEADS_TO 관계의 relationship 프로퍼티 값으로 저장됨"
                    },
                    "strength": {"type": "number", "minimum": 0, "maximum": 1}
                },
                "required": ["from", "to", "relation"]
            }
        }
    },
    "required": ["core_pattern", "emotion_nodes", "edges"]
}

④ GoT → Cypher 직접 발행 코드 (실제 구현)

별도 graph_transformer 유틸은 없습니다. src/agents/podcast/podcast_reasoning.py:701 내부에서 LEADS_TO Cypher를 직접 발행합니다.

# podcast_reasoning.py:701 핵심 MERGE 패턴
async def _save_got_to_neo4j(
    self, got_output: dict, user_id: str, session_id: str
) -> None:
    """GoT 감정 그래프를 Neo4j에 저장한다.

    관계 타입은 LEADS_TO 단일로 고정하며,
    relationship 프로퍼티(문자열)로 의미를 구분한다.
    """
    for node in got_output["emotion_nodes"]:
        await self.neo4j.run(
            """
            MERGE (e:Emotion {name: $name, user_id: $user_id})
            ON CREATE SET
                e.intensity = $intensity,
                e.confidence = $confidence,
                e.created_at = datetime(),
                e.session_id = $session_id
            ON MATCH SET
                e.intensity = $intensity,
                e.confidence = $confidence,
                e.last_updated = datetime()
            """,
            name=node["label"], user_id=user_id,
            intensity=node["intensity"], confidence=node.get("confidence", 0.8),
            session_id=session_id,
        )

    for edge in got_output["edges"]:
        await self.neo4j.run(
            """
            MATCH (a:Emotion {name: $from_name, user_id: $user_id})
            MATCH (b:Emotion {name: $to_name, user_id: $user_id})
            MERGE (a)-[:LEADS_TO {relationship: $rel}]->(b)
            SET r.strength = $strength, r.session_id = $session_id
            """,
            from_name=edge["from"], to_name=edge["to"],
            user_id=user_id, rel=edge["relation"],
            strength=edge.get("strength", 0.5), session_id=session_id,
        )

⑤ 축적과 활용: 시간이 지나면 그래프가 성장한다

세션 1 (1월):
  고립감(0.8) -[:LEADS_TO {relationship:"amplifies"}]→ 책임감(0.9)

세션 2 (2월):
  고립감(0.7) -[:LEADS_TO {relationship:"amplifies"}]→ 책임감(0.85)
  억울함(0.7) -[:LEADS_TO {relationship:"conflict_with"}]→ 책임감(0.85)  ← 새 노드 추가

세션 3 (3월):
  고립감(0.5) -[:LEADS_TO {relationship:"amplifies"}]→ 책임감(0.8)       ← 고립감 감소 감지
  자기효능감(0.6) -[:LEADS_TO {relationship:"triggers"}]→ 신뢰감(0.5)    ← 긍정 패턴 등장

→ Reasoning Agent: "3개월간 고립감이 0.8→0.5로 개선되었습니다.
   자기효능감이라는 새로운 긍정 패턴이 나타나고 있습니다."

Knowledge Agent의 지식 그래프와의 차이점:

구분	GoT 감정 그래프 (Reasoning Agent)	지식 그래프 (Knowledge Agent)
생성 시점	매 세션마다 LLM이 동적 생성	시스템 구축 시 전문가가 사전 구축 · 주기적 업데이트
노드 내용	사용자의 구체적 감정 (고립감, 책임감 등)	심리학 개념, 치료 기법 (CBT, DBT 등)
데이터 성격	개인화 데이터 (user_id로 격리)	공용 지식 베이스 (모든 사용자 공유)
변화 빈도	매 세션마다 노드 추가/수치 업데이트	새 논문/기법 추가 시에만 업데이트
주요 관계	LEADS_TO (relationship 프로퍼티로 의미 구분)	TREATS, RELATED_TO, INCLUDES, SUPPORTED_BY_RESEARCH

3. Neo4j Cypher 쿼리 예제

3.1 스키마 생성 (Constraints & Indexes)

-- 노드 고유 제약조건
CREATE CONSTRAINT emotion_name_unique FOR (e:Emotion) REQUIRE e.name IS UNIQUE;
CREATE CONSTRAINT technique_name_unique FOR (t:Technique) REQUIRE t.name IS UNIQUE;
CREATE CONSTRAINT condition_name_unique FOR (c:Condition) REQUIRE c.name IS UNIQUE;
CREATE CONSTRAINT memory_id_unique FOR (m:Memory) REQUIRE m.id IS UNIQUE;
CREATE CONSTRAINT person_name_unique FOR (p:Person) REQUIRE p.name IS UNIQUE;

-- 인덱스 생성 (검색 성능 최적화)
CREATE INDEX emotion_intensity FOR (e:Emotion) ON (e.intensity);
CREATE INDEX technique_domain FOR (t:Technique) ON (t.domain);
CREATE INDEX memory_importance FOR (m:Memory) ON (m.importance);
CREATE INDEX memory_created FOR (m:Memory) ON (m.created_at);

3.2 GoT 감정 그래프 생성 (Reasoning Agent — LEADS_TO 단일 타입)

-- CEO의 복합 감정 구조 생성
MERGE (isolation:Emotion {name: '고립감', user_id: $user_id})
SET isolation.intensity = 0.8, isolation.duration = '장기', isolation.confidence = 0.92

MERGE (injustice:Emotion {name: '억울함', user_id: $user_id})
SET injustice.intensity = 0.7, injustice.trigger = '팀원_이해_부족', injustice.confidence = 0.85

MERGE (responsibility:Emotion {name: '책임감', user_id: $user_id})
SET responsibility.intensity = 0.9, responsibility.confidence = 0.95

MERGE (decision_anxiety:Emotion {name: '결정불안감', user_id: $user_id})
SET decision_anxiety.intensity = 0.6, decision_anxiety.confidence = 0.78

-- 감정 간 관계 (LEADS_TO 단일 타입, relationship 프로퍼티로 의미 구분)
MERGE (isolation)-[r1:LEADS_TO {relationship: 'amplifies'}]->(responsibility)
SET r1.strength = 0.7

MERGE (injustice)-[r2:LEADS_TO {relationship: 'conflict_with'}]->(responsibility)
SET r2.strength = 0.8

MERGE (responsibility)-[r3:LEADS_TO {relationship: 'triggers'}]->(decision_anxiety)
SET r3.strength = 0.65

MERGE (isolation)-[r4:LEADS_TO {relationship: 'exacerbates'}]->(injustice)
SET r4.strength = 0.6

3.3 Knowledge Agent 지식 그래프 쿼리

-- 특정 조건에 효과적인 치료 기법 검색
MATCH (t:Technique)-[r:TREATS]->(c:Condition {name: '불안'})
WHERE r.effectiveness_level >= 0.7
RETURN t.name, t.domain, r.effectiveness_level, r.evidence_strength
ORDER BY r.effectiveness_level DESC
LIMIT 5;

-- 관련 기법 네트워크 탐색 (2-hop)
MATCH path = (t1:Technique {name: 'CBT 인지재구성'})
  -[:RELATED_TO*1..2]-(t2:Technique)
RETURN t1.name, t2.name,
  [r IN relationships(path) | r.similarity_score] AS scores;

-- 기법 → 개념 → 근본원인 경로 탐색
MATCH path = (t:Technique)-[:INCLUDES]->(c:Concept)
  <-[:HAS_ROOT_CAUSE]-(p:PsychologicalPattern)
WHERE t.domain = 'cbt'
RETURN t.name AS technique, c.name AS concept,
  p.name AS pattern, p.type AS pattern_type;

3.4 Memory Agent 기억 관계 쿼리

-- 특정 사용자의 기억에 등장하는 인물 관계 검색
MATCH (m:Memory)-[:MENTIONS]->(p:Person)
WHERE m.user_id = $user_id
RETURN p.name, COUNT(m) AS mention_count,
  AVG(m.importance) AS avg_importance
ORDER BY mention_count DESC;

-- 감정-기억 연관 패턴 분석
MATCH (m:Memory)-[r:ASSOCIATED_WITH]->(e:Emotion)
WHERE m.user_id = $user_id AND e.intensity >= 0.7
RETURN e.name, COUNT(m) AS memory_count,
  AVG(r.intensity) AS avg_emotion_intensity,
  COLLECT(DISTINCT m.id) AS memory_ids
ORDER BY avg_emotion_intensity DESC;

3.5 마인드 온톨로지 시각화 쿼리 (Visualization Agent)

-- 조직 전체의 감정 그래프 추출 (B2B 대시보드용)
MATCH (e1:Emotion)-[r:LEADS_TO]->(e2:Emotion)
WHERE e1.intensity >= 0.5 OR e2.intensity >= 0.5
RETURN e1.name AS source, r.relationship AS relationship,
  e2.name AS target, e1.intensity AS source_intensity,
  e2.intensity AS target_intensity
ORDER BY e1.intensity DESC;

-- 3-레이어 온톨로지 네트워크 추출
MATCH (lc:LeadershipConcept)-[:SUPPORTED_BY_RESEARCH]->(er:ExpertResource)
OPTIONAL MATCH (us:UserStory)-[:RELATED_TO_CONCEPT]->(lc)
OPTIONAL MATCH (pp:PsychologicalPattern)-[:SIMILAR_TO_PATTERN]-(pp2:PsychologicalPattern)
RETURN lc.name AS concept, er.title AS research,
  COLLECT(DISTINCT us.id) AS related_stories,
  COLLECT(DISTINCT pp.name) AS related_patterns;

4. Pinecone VectorDB 스키마 설계

4.1 인덱스 구조 (현재 운영)

Pinecone은 단일 통합 인덱스 rag-suite-knowledge 하나만 운영합니다 (PR #153). user_id 네임스페이스 필터로 개인 데이터와 공용 전문 지식을 격리합니다.

인덱스명: rag-suite-knowledge
임베딩 모델: KT Cloud RAG Suite Embedding (PASSAGE)
쿼리 유사도 임계값: 0.25 (PR #155, KT Cloud Query↔Passage 실측 score 0.20~0.35 반영)
격리 방식: user_id 네임스페이스 필터
담당 에이전트: Knowledge, Memory, Episode Memory 공통
관련 코드: src/services/knowledge/pinecone_client.py, src/agents/podcast/knowledge_agent.py

4.2 메타데이터 스키마

개인 기억 데이터 (Memory Agent)

메타데이터 필드	타입	설명	필터 용도
user_id	string	사용자 식별자	개인 데이터 격리 (필수)
memory_type	string	"short" / "medium" / "long"	기억 유형 필터
importance	float	중요도 점수 (0.0–1.0)	중요도 임계값 필터
created_at	string (ISO)	기억 생성 시각	시간 범위 필터
topic_tag	string	주제 태그	주제 기반 필터
sentiment	string	감정 톤	감정 기반 리랭킹

전문 지식 데이터 (Knowledge Agent)

메타데이터 필드	타입	설명	필터 용도
domain	string	"cbt" / "dbt" / "psychodynamic" / "crisis_intervention" / "general_psychology"	도메인 필터
source	string	학술 출처	출처 필터
evidence_level	string	"high" / "moderate" / "low"	근거 수준 필터
is_shared	boolean	공유 허락 여부	공유 데이터 필터
is_anonymized	boolean	익명화 여부	프라이버시 필터

에피소드 아카이브 데이터 (Episode Memory)

메타데이터 필드	타입	설명	필터 용도
user_id	string	사용자 식별자	개인 에피소드 필터
themes	list[string]	에피소드 주제	주제 필터
created_at	string (ISO)	생성 시각	시간 필터
episode_id	string	에피소드 식별자	에피소드 특정 조회

4.3 에피소드 레벨 청킹 전략

데이터 타입	청킹 단위	토큰 범위	메타데이터
개인 그린룸 데이터	1회 세션 = 1 청크	300~800 토큰	user_id, session_date, topic, emotion, risk_level
전문 지식/공유 이야기	1 사례/개념 = 1 청크	400~1000 토큰	source_type, domain, themes, difficulty_level
팟캐스트 에피소드	전체 에피소드 = 1 청크	2000~3000 토큰	episode_id, user_id, themes, publication_date

5. Pinecone Python 쿼리 예제

실제 구현 기준입니다. src/services/knowledge/pinecone_client.py 및 src/agents/podcast/knowledge_agent.py 참조.

5.1 Knowledge Agent — 전문 지식 검색

from src.services.knowledge.pinecone_client import PineconeClient

client = PineconeClient()

# rag-suite-knowledge 단일 인덱스, user_id 네임스페이스 필터
results = await client.search(
    query="팀 갈등 해소 코칭 기법",
    top_k=5,
    score_threshold=0.25,   # PR #155 임계값
    filter={"domain": {"$in": ["cbt", "crisis_intervention"]}},
    namespace=user_id,       # user_id 네임스페이스
)

for match in results:
    if match["score"] >= 0.25:
        print(f"ID: {match['id']}, Score: {match['score']:.4f}")
        print(f"  Metadata: {match['metadata']}")

5.2 Memory Agent — 기억 저장

# KT Cloud RAG Suite Embedding (PASSAGE)으로 임베딩 생성 후 upsert
await client.upsert(
    vectors=[{
        "id": "mem_abc123",
        "values": embedding,      # KT Cloud PASSAGE 임베딩
        "metadata": {
            "user_id": user_id,
            "memory_type": "long",
            "importance": 0.85,
            "created_at": datetime.now().isoformat(),
            "topic_tag": "team_communication",
            "sentiment": "negative",
        }
    }],
    namespace=user_id,
)

5.3 Memory Agent — 기억 검색

results = await client.search(
    query="팀원과의 갈등 경험",
    top_k=10,
    score_threshold=0.25,
    filter={
        "memory_type": {"$in": ["long", "medium"]},
        "importance": {"$gte": 0.5},
        "created_at": {"$gte": "2025-01-01"},
    },
    namespace=user_id,   # user_id 네임스페이스로 개인 데이터 격리
)

5.4 Episode Memory — 유사 에피소드 검색 (Podcast)

results = await client.search(
    query="리더십 역할 전환 신뢰 구축",
    top_k=3,
    score_threshold=0.25,
    filter={"episode_id": {"$exists": True}},
    namespace=user_id,
)

for match in results:
    if match["score"] > 0.85:
        continuity = "build_on"       # 이전 테마 확장
    elif match["score"] > 0.7:
        continuity = "contrast_with"  # 대비 관점 제시
    else:
        continuity = "none"           # 새로운 주제
    print(f"Episode: {match['id']}, Score: {match['score']:.4f}, Action: {continuity}")

6. 듀얼 스토어 연동 패턴 (GraphRAG)

6.1 GraphRAG 워크플로우

VectorDB의 유사도 검색과 Neo4j의 관계 탐색을 결합하여 인과 관계가 포함된 검색 결과를 생성합니다.

async def graph_rag_search(user_input: str, user_id: str) -> dict:
    """
    GraphRAG 통합 검색 파이프라인
    1. VectorDB에서 유사 문서 검색
    2. Neo4j에서 관계 그래프 탐색
    3. 결과 통합
    """
    # Step 1: VectorDB 검색 (의미론적 유사도)
    # rag-suite-knowledge 단일 인덱스, user_id 네임스페이스
    personal_docs = await pinecone_client.search(
        query=user_input,
        score_threshold=0.25,
        filter={"memory_type": {"$exists": True}},
        namespace=user_id,
        top_k=5,
    )

    expert_docs = await pinecone_client.search(
        query=user_input,
        score_threshold=0.25,
        filter={"is_shared": True, "is_anonymized": True},
        namespace="shared",
        top_k=5,
    )

    # Step 2: Neo4j 그래프 탐색 (LEADS_TO 관계)
    graph_results = await neo4j.run(
        """
        MATCH (e:Emotion {user_id: $user_id})-[r:LEADS_TO]->(e2:Emotion)
        RETURN e.name, r.relationship, e2.name, r.strength
        ORDER BY r.strength DESC LIMIT 10
        """,
        user_id=user_id,
    )

    return {
        "personal_context": personal_docs,
        "expert_context": expert_docs,
        "graph_context": graph_results,
    }

6.2 감정 기반 리랭킹

def rerank_by_emotion(documents: list, emotion_vector: dict, user_id: str) -> list:
    """
    감정 기반 문서 재순위화
    Final Score = (Vector Similarity × 0.4)
               + (Emotion Match × 0.3)
               + (Recency Weight × 0.2)
               - (Trauma Risk × 0.1)
    """
    reranked = []
    for doc in documents:
        vector_score = doc['similarity_score']
        emotion_score = cosine_similarity(doc['emotion_vector'], emotion_vector)
        recency_weight = calculate_recency(doc['timestamp'])
        trauma_risk = safety_agent.check_trauma_trigger(doc, user_id)

        final_score = (
            vector_score * 0.4 +
            emotion_score * 0.3 +
            recency_weight * 0.2 -
            trauma_risk * 0.1
        )
        reranked.append((doc, final_score))

    return sorted(reranked, key=lambda x: x[1], reverse=True)

7. 에이전트별 DB 활용 매트릭스

에이전트	TIER	Neo4j	Pinecone (rag-suite-knowledge)	MySQL (Backend Client 경유)	주요 작업
Memory	TIER 1	Memory↔Person 관계	user_id 네임스페이스 검색/저장	memories 테이블 (Backend 합의 필요)	개인 기억 R/W
Knowledge	TIER 1	Technique→Condition 탐색	shared 네임스페이스 검색	knowledge_base 테이블 (Backend 합의 필요)	전문 지식 검색
Reasoning	TIER 1	GoT 감정 그래프 생성/탐색 (LEADS_TO)	—	—	복합 감정 구조화
Episode Memory	TIER 1 (팟캐스트)	—	user_id 네임스페이스 검색/저장	podcast_episodes 테이블 (Backend 합의 필요)	에피소드 연속성
Visualization	비동기	온톨로지 네트워크 추출	—	—	B2B 대시보드 시각화
Learning	비동기	—	—	user_profiles, learning_log	사용자 선호도 학습

8. 실전 활용 시나리오

시나리오 1: 그린룸 입장권 분석

사용자 입력: "팀장으로서 팀과의 거리감이 느껴집니다"

① Memory Agent (Pinecone rag-suite-knowledge, user_id 네임스페이스):
   → "팀 거리감" 관련 과거 기록 검색 (임계값 0.25)
   → 결과: 3개월 전 유사 고민 기록 발견

② Knowledge Agent (Pinecone rag-suite-knowledge, shared 네임스페이스):
   → "리더십 거리감" 전문 자료 검색 (임계값 0.25)
   → Neo4j: Technique(서번트 리더십) -[:TREATS]→ Condition(팀 소외감)

③ Reasoning Agent (Neo4j):
   → GoT 감정 그래프 생성:
      고립감 -[:LEADS_TO {relationship:"amplifies"}]→ 책임감
      책임감 -[:LEADS_TO {relationship:"triggers"}]→ 결정불안감

④ 결과:
   → 개인 경험 + 전문 지식 + 감정 구조 = 맞춤형 코칭 응답

시나리오 2: 팟캐스트 콘텐츠 기획

사용자 테마: "변화의 시대에 팀을 이끌어야 해요"

① Episode Memory (Pinecone rag-suite-knowledge, user_id 네임스페이스):
   → 유사 에피소드 검색 (임계값 0.25)
   → score=0.82 → continuity="contrast_with" (대비 관점)

② Memory Agent (Pinecone):
   → 사용자의 과거 그린룸 데이터에서 변화 관련 경험 추출

③ Knowledge Agent (Neo4j):
   → 변화관리 이론 -[:SUPPORTED_BY_RESEARCH]→ Kotter 8단계
   → PsychologicalPattern(변화저항) -[:SIMILAR_TO_PATTERN]→ 불확실성 회피

④ Podcast Reasoning:
   → 개인 경험 + 전문 지식 + 관계 구조 = 에피소드 스크립트

9. 프라이버시 및 보안 고려사항

Pinecone 데이터 격리는 단일 rag-suite-knowledge 인덱스 + user_id 네임스페이스 필터 기반입니다 (PR #153).

원칙	Neo4j 적용	Pinecone 적용
데이터 격리	user_id 기반 서브그래프 분리	모든 쿼리에 user_id 네임스페이스 필수
익명화	UserStory 노드 식별정보 제거	is_anonymized=true 메타데이터 필터
동의 관리	공유 노드에 consent 속성 부여	is_shared=true 필터로 동의 데이터만 검색
삭제 권리	DETACH DELETE로 노드+관계 삭제	delete API로 벡터 즉시 삭제
감사 로그	쿼리 로그 MySQL에 Backend Client 경유 기록	검색 로그 MySQL에 Backend Client 경유 기록

-- Neo4j: 사용자 데이터 삭제 (잊혀질 권리)
MATCH (m:Memory {user_id: $user_id})
OPTIONAL MATCH (m)-[r]-()
DETACH DELETE m;

-- Neo4j: 공유 이야기 익명화
MATCH (us:UserStory {id: $story_id})
SET us.anonymized_at = datetime(),
    us.original_user_id = null;

# Pinecone: 사용자 벡터 삭제
await pinecone_client.delete(
    filter={"user_id": {"$eq": user_id}},
    namespace=user_id,
)

참고 문서 및 연계

AI CoT/ToT/GoT 자료 — GoT 감정 그래프 구조, Reasoning Agent 추론 체계
AI RAG 자료 — 듀얼 RAG 파이프라인, 하이브리드 검색, 리랭킹
AI 시각화 자료 — 마인드 온톨로지 네트워크 시각화
AI 파트 아키텍처 — v5.0 TIER 구조, 에이전트 역할
AI 에이전트, MCP 자료 — 메시지 프로토콜 v2.0, MCP 도구

향후 계획

Neo4j 관계 타입 확장 (GoT 정교화)

현재 LEADS_TO 단일 관계 + relationship 프로퍼티로 구분하는 구조를 향후 GoT 추론 정교화 시 전용 관계 타입으로 분리할 수 있습니다.

관계	의미	추가 속성
AMPLIFIES	감정 A가 감정 B를 증폭	strength
TRIGGERS	감정 A가 감정 B를 유발	probability
CONFLICT_WITH	감정 간 충돌 관계	tension_level
EXACERBATES	감정 A가 감정 B를 악화	severity

관계 타입 분리 시 GoT Structured Output Schema의 edges.relation enum도 함께 업데이트 필요.

Pinecone BM25 하이브리드 검색

현재 Dense(의미론적) 검색만 운영 중입니다. 향후 BM25(키워드) 검색을 결합하여 치료 기법명 등 고유명사 정확 매칭 성능을 개선할 수 있습니다.

BM25(키워드) 30% + Dense(의미론적) 70% 하이브리드 검색

Elasticsearch 또는 Pinecone 자체 하이브리드 검색 기능 활용 검토 필요.

Pinecone user_id 필터 경로 재검증

user_id 필터가 네임스페이스 방식과 filter={user_id: ...} 메타데이터 방식 중 어느 경로로 적용되는지 실제 코드 재확인 필요.

확인 대상: src/services/knowledge/pinecone_client.py
Episode Memory 쿼리 경로: src/agents/podcast/episode_memory.py

MySQL 테이블 스키마 Backend 합의

아래 테이블은 Backend 팀과 스키마 합의 후 TYPE_* 상수(src/api/backend_resources.py)로 포맷화 예정입니다.

memories 테이블 (Memory Agent)
knowledge_base 테이블 (Knowledge Agent)
podcast_episodes 테이블 (Episode Memory)

변경 이력

날짜	변경 내용
2026-04-17	Neo4j 감정 관계 LEADS_TO 단일 타입으로 직접 반영 (AMPLIFIES 등 4종 → 향후 계획 이동), Pinecone rag-suite-knowledge 단일 인덱스 직접 기재 (3개 분리 인덱스 설계 제거), MySQL + Backend Client 경유 명시 (PostgreSQL 제거), GoT → Cypher graph_transformer 없음 명시 (LEADS_TO 직접 발행 코드로 교체), 경고 박스 전면 제거

1. 듀얼 데이터 스토어 아키텍처 개요​

왜 Neo4j + VectorDB 듀얼 스토어인가?​

2. Neo4j 그래프 스키마 설계​

2.1 노드 타입 정의​

2.2 관계 타입 정의​

2.3 Neo4j ERD (Entity-Relationship Diagram)​

GoT → Neo4j 자동 연동 메커니즘​

① 미리 정의하는 것 vs 자동 생성되는 것​

② 전체 파이프라인: 사용자 입력 → Neo4j 저장까지​

③ LLM에게 전달하는 Structured Output Schema​

④ GoT → Cypher 직접 발행 코드 (실제 구현)​

⑤ 축적과 활용: 시간이 지나면 그래프가 성장한다​

3. Neo4j Cypher 쿼리 예제​

3.1 스키마 생성 (Constraints & Indexes)​

3.2 GoT 감정 그래프 생성 (Reasoning Agent — LEADS_TO 단일 타입)​

3.3 Knowledge Agent 지식 그래프 쿼리​

3.4 Memory Agent 기억 관계 쿼리​

3.5 마인드 온톨로지 시각화 쿼리 (Visualization Agent)​

4. Pinecone VectorDB 스키마 설계​

4.1 인덱스 구조 (현재 운영)​

4.2 메타데이터 스키마​

4.3 에피소드 레벨 청킹 전략​

5. Pinecone Python 쿼리 예제​

5.1 Knowledge Agent — 전문 지식 검색​

5.2 Memory Agent — 기억 저장​

5.3 Memory Agent — 기억 검색​

5.4 Episode Memory — 유사 에피소드 검색 (Podcast)​

6. 듀얼 스토어 연동 패턴 (GraphRAG)​

6.1 GraphRAG 워크플로우​

6.2 감정 기반 리랭킹​

7. 에이전트별 DB 활용 매트릭스​

8. 실전 활용 시나리오​

시나리오 1: 그린룸 입장권 분석​

시나리오 2: 팟캐스트 콘텐츠 기획​

9. 프라이버시 및 보안 고려사항​

참고 문서 및 연계​

향후 계획​

Neo4j 관계 타입 확장 (GoT 정교화)​

Pinecone BM25 하이브리드 검색​

Pinecone user_id 필터 경로 재검증​

MySQL 테이블 스키마 Backend 합의​

변경 이력​