电商搜索层次化意图分类 - 母婴跨境广告自动词分类
Skill-Hierarchical-Search-Intent-Classification · 13-广告分析
1. 解决的问题
WF-B 广告优化的核心是"自动词拉取质量"——母婴搜索词意图复杂(月龄敏感/信息查询/购买意图),错分会导致广告全链路失效. 本论文用两层意图分类:① Label Hierarchy(标签图 GCN + 注意力)让 fine-grained 子类感知父类约束;② Instance Hierarchy(对比学习负对)区分同父类不同子类的查询;③ Neighborhood-aware Sampling(自训练)解决少数类(敏感词 0.05%-0.15%)冷启动. 在 Amazon 真实搜索数据上超
2. 核心算法逻辑
WFB 广告优化的核心是"自动词拉取质量"——母婴搜索词意图复杂(月龄敏感/信息查询/购买意图),错分会导致广告全链路失效. 本论文用两层意图分类:① Label Hierarchy(标签图 GCN + 注意力)让 finegrained 子类感知父类约束;② Instance Hierarchy(对比学习负对)区分同父类不同子类的查询;③ Neighborhoodaware Sampling(自训练)解决少数类(敏感词 0.05%0.
3. 业务应用场景
- 业务问题:Momcozy 跑 Amazon SP 广告时,自动词包含 "baby bottle 0-3 months" 和 "baby bottle 4-6 months",传统分类器混淆为同一 aspect,导致0-3 月广告投到 4-6 月用户(月龄不匹配 → CTR 高但转化率极低,ACOS 飙到 60%+) - 数据要求:历史搜索词 + 转化标签 + 月龄标注(可借 Amazon ESCI 数据训练 + 母婴垂类微调) - 层次配置: - 对比学习:同 Feeding 父类的 0-3M / 4-6M 互为强负对,模型自动学到月龄边界 - 业务价值: - 月龄错配广告降低 70-80
- 业务问题:Momcozy 不区分 "when to introduce solid food" (信息查询,低购买意图) 和 "buy Hipp organic stage 1" (购买意图),前者烧广告费但不转化. 母婴用户决策周期长,信息查询占 60-70% - 数据要求:同上 + 意图分类标注 - 层次配置: - 对 Informational 查询触发科普内容广告(低 CPC,品牌曝光) - 对 Transactional 查询触发品牌关键词竞价(高 bid,直接转化) - 业务价值: - ROAS 提升 3-10%(论文 implied 范围) - 信息查询低 CPC 广告反哺品
4. 输入数据要求
请查看原始代码模板获取输入规格。
5. 输出结果
请查看原始代码模板获取输出规格。
6. 业务价值 / ROI
- 易处:Amazon ESCI 数据集 130K 查询 + 260 万标注对完全公开
- 易处:bert-base / esci-products-v3 可作骨干模型
- 难处:Amazon 论文未开源,Label Hierarchy GCN + 对比学习需自行实现
- 难处:母婴垂类标签树需业务专家初始化
- 难处:Neighborhood-aware Sampling 需大量无标注查询
7. 代码模板
代码块数量:3 · 路径:未检测到
"""
层次化电商搜索意图分类 - 母婴出海版骨架
论文 arXiv:2403.06021 (Amazon, WWW 2024)
Amazon ESCI 数据集开源: github.com/amazon-science/esci-data
依赖: pip install torch transformers
"""
from __future__ import annotations
from typing import Dict, List, Tuple
LABEL_TREE = {
"informational": ["how_to", "comparison", "safety_concern"],
"transactional": ["specific_product", "browse_category"],
"age_specific": ["0_3m", "4_6m", "7_12m", "1_3y"],
}
ALL_CHILDREN: List[str] = [c for children in LABEL_TREE.values() for c in children]
CHILD2PARENT: Dict[str, str] = {c: p for p, children in LABEL_TREE.items() for c in children}
def rule_based_classify(query: str) -> Tuple[str, str]:
"""规则版分类器(生产替换为 BERT + Label Hierarchy GCN)"""
q = query.lower()
if any(kw in q for kw in ["newborn", "0-3 month", "0 to 3 month"]):
child = "0_3m"
elif any(kw in q for kw in ["4-6 month", "stage 1", "first solid"]):
child = "4_6m"
elif any(kw in q for kw in ["7-12 month", "stage 2"]):
child = "7_12m"
elif any(kw in q for kw in ["1-3 year", "toddler", "stage 3"]):
child = "1_3y"
elif any(kw in q for kw in ["how to", "when to", "how long"]):
child = "how_to"
elif any(kw in q for kw in ["vs", "versus", "compare"]):
child = "comparison"
elif any(kw in q for kw in ["safe", "safety", "allergic"]):
child = "safety_concern"
elif any(kw in q for kw in ["buy ", "purchase ", "order "]):
child = "specific_product"
else:
child = "browse_category"
parent = CHILD2PARENT[child]
return parent, child
def hierarchical_loss_components(
child_pred_correct: int,
parent_pred_correct: int,
total_samples: int,
lam: float = 1.0,
) -> Dict:
"""层次分类损失成分(简化版,不用 torch 跑通)"""
child_acc = child_pred_correct / total_samples
parent_acc = parent_pred_correct / total_samples
loss_child = -child_acc
loss_parent = -parent_acc
total_loss = loss_child + lam * loss_parent
return {"child_acc": child_acc, "parent_acc": parent_acc, "total_loss": total_loss}
8. 论文来源
- 2403.06021