使用OpenTelemetry来监控LangChain(含LangGraph)应用
本文介绍了如何利用OpenTelemetry监控LangChain应用的方法。只需设置环境变量即可将监控数据发送到OpenTelemetry服务器。文章详细说明了使用Docker部署Grafana LGTM作为监控后端的步骤,同时提供了Python环境的配置指南,展示如何通过Dashboard查看应用的吞吐量、延迟时间和调用链追踪。特别演示了如何分析LangGraph Agent的复杂调用关系,通
1. 前言
LangChain(含LangGraph)是在AI应用开发中被广泛使用的框架之一。这类应用渐渐成为企业应用系统的重要部分。OpenTelemetry作为企业应用系统监控的整体解决方案,肯定要纳入这部分关键应用。当然实现起来也非常简单,我这里就简单描述一下基本的做法。
2. 借助LangSmith的机制来实现OpenTelemetry对LangChain应用的监控
我曾经写过一篇使用OpenTelemetry监控Python应用的文章,原则上是可以用的,但是不巧的是它对LangChain的实现有bug还没有修复。实际上,就像Java的Spring Framework一样,Python的LangChain框架自身就对OpenTelemetry有良好的支持,这个支持是通过其LangSmith来实现的。我们只要激活LangSmith的机制,也不必用LangSmith的服务器,可以把数据直接发送到OpenTelemetry的服务器上。
下面我把实现的步骤讲一下:
3. 启动OpenTelemetry的后端工具
OpenTelemetry的后端工具,就是支持OpenTelemetry metrics/traces/logs/profiles的数据库和UI的工具集。最常见的做法还是使用OpenTelemetry Collector来连接不同的后端工具。假如您是初学者,或者系统很小,可以直接使用基于Docker的Grafana LGTM,几乎是一键安装完成,非常简单易行。前提是需要您有个支持Docker的环境。
假设您想把LGTM安装到/opt/lgtm目录 (任何目录均可),下面是命令(假设在Linux系统):
docker pull grafana/otel-lgtm
mkdir /opt/lgtm
cd /opt/lgtm
wget https://raw.githubusercontent.com/grafana/docker-otel-lgtm/main/run-lgtm.sh
chmod +x run-lgtm.sh
sed -i 's/3000:3000/3100:3000/' run-lgtm.sh
注意最后一行命令,因为LGTM的Grafana的默认对外端口是3000,这个端口经常和一些应用程序冲突,我就改成了3100.
下面是启动LGTM的方法,就一个命令:
cd /opt/lgtm
./run-lgtm.sh
LGTM要监听以下的端口:
- 4317/4318 是OTLP端口,用来接受metrics/traces/logs数据
- 3100 是Grafana UI的端口
- 9090 是Prometheus的端口,用于调试
- 4040 是Pyroscope接受Profiles的端口,将来会整合进入4317/4318
4. 如何激活LangChain应用发送数据给OpenTelemetry
1) 增加Python包:langsmith
一般先要进到应用使用的虚拟环境(venv, Conda之类的)。
然后可以简单地使用如下的命令安装Python包:
pip install -U langsmith
当然更正规的方式是在应用的requirements.txt里面增加两行:
langsmith
然后重新执行:
pip install -r requirements.txt
2) 在启动LangChain应用前设置必要的环境变量
假如您启动应用的命令是:
python app.py
那可以改成如下命令来启动应用:
export LANGSMITH_TRACING=true
export LANGSMITH_OTEL_ENABLED=true
export LANGSMITH_OTEL_ONLY=true
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
export LS_APM_OTEL_ENABLED=true
export OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
export OTEL_SERVICE_NAME=MY_SERVICE1
python app.py
注意:
- 按照您自己应用的需要来修改OTEL_SERVICE_NAME环境变量
- 如果OpenTelemetry后端(本例采用LGTM)不在本机,假如在1.2.3.4,则设置响应的OTEL_EXPORTER_OTLP_ENDPOINT为
export OTEL_EXPORTER_OTLP_ENDPOINT=http://1.2.3.4:4318
5. 监控LangChain应用的Dashboard
使用“http://localhost:3100”就可以访问LGTM的Grafana UI界面(如果是远程的话,用主机名或者IP替换localhost),使用admin/admin登录。虽然LGTM支持各种类型的数据:metrics/traces/logs/profiles,但是由于OpenTelemetry对Python的自动监控目前只支持traces,而我们可以通过traces计算出一些metrics,所以我们的Grafana Dashboard只能有traces和metrics这两种数据。
以下是我做的Dashboard的截图:

我简单介绍一下这个Dashboard:
第一行的两个图都是展现吞吐量:左图是基于每秒请求数,右图是基于每秒的字节数。
左图用到的PromQL是:
sum by (service, span_name, status_code) (
rate(traces_spanmetrics_calls_total[2m])
)
右图用到的PromQL是:
sum by (service, span_name, status_code) (rate(traces_spanmetrics_size_total[2m]))
第二行展现的是延迟时间,用到的PromQL是:
histogram_quantile(0.90, sum by (le, service, span_name, status_code) (rate(traces_spanmetrics_latency_bucket[2m])))
第三行展现的是Traces的列表。Trace的分析有各种做法,我举一个例子。比如基于LangGraph做的Agent,内部的调用比较复杂,很多是自动判断来完成的。我们要进行性能分析的时候,就需要看清楚调用的逻辑顺序和个步骤的消耗,这时候Trace就特别有价值。
现在我们先选择查询我们这些应用中从Agent发送来的Traces。点击Trace图右上角的三个小点,选择“Edit”进入设置,然后选择适当的查询方法。假如我们的某类Agent的服务名称是“agent1”,就在服务名称中选择或者直接输入“agent1”。点击“Refresh”按键,得到在选择的时间段内所有“agent1”的Traces,如下图所示:

点积其中一个Trace,如下图,可以看出内部的调用关系和时序。这个Agent还算简单,从根agent1(root span)生出两个子span(“retrieval”和“chatbot”),分别对用两个LangGraph节点。“retrieval”节点下又分出三个子span(VectorStoreRetriever,ChatPromptTemplate,和route_after_retrieval),从代码可以看到是三个调用。而“chatbot”下只分出一个子span,叫做“ChatOllama”。

我们点积耗时最长的叶子span - “ChatOllama”。可以看到这个Span的详细信息,如下图,包括模型名称,token,输入与回答等各种信息。

下面是Dashboard的源码,仅供参考:
{
"apiVersion": "dashboard.grafana.app/v2",
"kind": "Dashboard",
"metadata": {
"name": "adkqq5s",
"namespace": "default",
"uid": "8e305963-7a7d-4d83-b256-e40d9b932f12",
"resourceVersion": "1777469615660990",
"generation": 11,
"creationTimestamp": "2026-04-18T15:09:59Z",
"labels": {
"grafana.app/deprecatedInternalID": "885991689437184"
},
"annotations": {
"grafana.app/createdBy": "user:eferfp17srfnkb",
"grafana.app/folder": "",
"grafana.app/saved-from-ui": "Grafana v13.0.1 (a100054f)",
"grafana.app/updatedBy": "user:eferfp17srfnkb",
"grafana.app/updatedTimestamp": "2026-04-29T13:33:35Z"
}
},
"spec": {
"annotations": [
{
"kind": "AnnotationQuery",
"spec": {
"query": {
"kind": "DataQuery",
"group": "grafana",
"version": "v0",
"datasource": {
"name": "-- Grafana --"
},
"spec": {}
},
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"builtIn": true
}
}
],
"cursorSync": "Off",
"editable": true,
"elements": {
"panel-1": {
"kind": "Panel",
"spec": {
"id": 1,
"title": "吞吐量(req/s)",
"description": "",
"links": [],
"data": {
"kind": "QueryGroup",
"spec": {
"queries": [
{
"kind": "PanelQuery",
"spec": {
"query": {
"kind": "DataQuery",
"group": "prometheus",
"version": "v0",
"datasource": {
"name": "prometheus"
},
"spec": {
"editorMode": "code",
"expr": "sum by (service, span_name, status_code) (\r\n rate(traces_spanmetrics_calls_total[2m])\r\n)",
"instant": true,
"legendFormat": "__auto",
"range": true
}
},
"refId": "A",
"hidden": false
}
}
],
"transformations": [],
"queryOptions": {}
}
},
"vizConfig": {
"kind": "VizConfig",
"group": "timeseries",
"version": "13.0.1",
"spec": {
"options": {
"annotations": {
"clustering": -1,
"multiLane": false
},
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"hideZeros": false,
"mode": "single",
"sort": "none"
}
},
"fieldConfig": {
"defaults": {
"unit": "reqps",
"thresholds": {
"mode": "absolute",
"steps": [
{
"value": 0,
"color": "green"
},
{
"value": 80,
"color": "red"
}
]
},
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"barWidthFactor": 0.6,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"showValues": false,
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
}
},
"overrides": []
}
}
}
}
},
"panel-2": {
"kind": "Panel",
"spec": {
"id": 2,
"title": "延迟时间(s)",
"description": "",
"links": [],
"data": {
"kind": "QueryGroup",
"spec": {
"queries": [
{
"kind": "PanelQuery",
"spec": {
"query": {
"kind": "DataQuery",
"group": "prometheus",
"version": "v0",
"datasource": {
"name": "prometheus"
},
"spec": {
"editorMode": "code",
"expr": "histogram_quantile(0.90, sum by (le, service, span_name, status_code) (rate(traces_spanmetrics_latency_bucket[2m])))",
"instant": true,
"legendFormat": "__auto",
"range": true
}
},
"refId": "A",
"hidden": false
}
}
],
"transformations": [],
"queryOptions": {}
}
},
"vizConfig": {
"kind": "VizConfig",
"group": "timeseries",
"version": "13.0.1",
"spec": {
"options": {
"annotations": {
"clustering": -1,
"multiLane": false
},
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"hideZeros": false,
"mode": "single",
"sort": "none"
}
},
"fieldConfig": {
"defaults": {
"unit": "s",
"thresholds": {
"mode": "absolute",
"steps": [
{
"value": 0,
"color": "green"
},
{
"value": 80,
"color": "red"
}
]
},
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"barWidthFactor": 0.6,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"showValues": false,
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
}
},
"overrides": []
}
}
}
}
},
"panel-3": {
"kind": "Panel",
"spec": {
"id": 3,
"title": "吞吐量(bytes/sec)",
"description": "",
"links": [],
"data": {
"kind": "QueryGroup",
"spec": {
"queries": [
{
"kind": "PanelQuery",
"spec": {
"query": {
"kind": "DataQuery",
"group": "prometheus",
"version": "v0",
"datasource": {
"name": "prometheus"
},
"spec": {
"editorMode": "code",
"expr": "sum by (service, span_name, status_code) (rate(traces_spanmetrics_size_total[2m]))",
"instant": true,
"legendFormat": "__auto",
"range": true
}
},
"refId": "A",
"hidden": false
}
}
],
"transformations": [],
"queryOptions": {}
}
},
"vizConfig": {
"kind": "VizConfig",
"group": "timeseries",
"version": "13.0.1",
"spec": {
"options": {
"annotations": {
"clustering": -1,
"multiLane": false
},
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"hideZeros": false,
"mode": "single",
"sort": "none"
}
},
"fieldConfig": {
"defaults": {
"unit": "Bps",
"thresholds": {
"mode": "absolute",
"steps": [
{
"value": 0,
"color": "green"
},
{
"value": 80,
"color": "red"
}
]
},
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"barWidthFactor": 0.6,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"showValues": false,
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
}
},
"overrides": []
}
}
}
}
},
"panel-4": {
"kind": "Panel",
"spec": {
"id": 4,
"title": "Traces",
"description": "",
"links": [],
"data": {
"kind": "QueryGroup",
"spec": {
"queries": [
{
"kind": "PanelQuery",
"spec": {
"query": {
"kind": "DataQuery",
"group": "tempo",
"version": "v0",
"datasource": {
"name": "tempo"
},
"spec": {
"filters": [
{
"id": "b664759c",
"operator": "=",
"scope": "span"
}
],
"key": "Q-9f6990d1-f1a3-4fe9-a46f-63846a4c475e-0",
"limit": 30,
"metricsQueryType": "range",
"queryType": "traceqlSearch",
"serviceMapUseNativeHistograms": false,
"spss": 3,
"tableType": "traces"
}
},
"refId": "A",
"hidden": false
}
}
],
"transformations": [],
"queryOptions": {}
}
},
"vizConfig": {
"kind": "VizConfig",
"group": "table",
"version": "13.0.1",
"spec": {
"options": {
"cellHeight": "sm",
"showHeader": true,
"sortBy": [
{
"desc": true,
"displayName": "Service"
}
]
},
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [
{
"value": 0,
"color": "green"
},
{
"value": 80,
"color": "red"
}
]
},
"custom": {
"align": "auto",
"cellOptions": {
"type": "auto"
},
"footer": {
"reducers": []
},
"inspect": false
}
},
"overrides": []
}
}
}
}
}
},
"layout": {
"kind": "GridLayout",
"spec": {
"items": [
{
"kind": "GridLayoutItem",
"spec": {
"x": 0,
"y": 0,
"width": 12,
"height": 8,
"element": {
"kind": "ElementReference",
"name": "panel-1"
}
}
},
{
"kind": "GridLayoutItem",
"spec": {
"x": 12,
"y": 0,
"width": 12,
"height": 8,
"element": {
"kind": "ElementReference",
"name": "panel-3"
}
}
},
{
"kind": "GridLayoutItem",
"spec": {
"x": 0,
"y": 8,
"width": 24,
"height": 8,
"element": {
"kind": "ElementReference",
"name": "panel-2"
}
}
},
{
"kind": "GridLayoutItem",
"spec": {
"x": 0,
"y": 16,
"width": 24,
"height": 7,
"element": {
"kind": "ElementReference",
"name": "panel-4"
}
}
}
]
}
},
"links": [],
"liveNow": false,
"preload": false,
"tags": [],
"timeSettings": {
"timezone": "browser",
"from": "now-30m",
"to": "now",
"autoRefresh": "",
"autoRefreshIntervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"hideTimepicker": false,
"fiscalYearStartMonth": 0
},
"title": "LangChain1",
"variables": []
}
}
6. 总结
本文介绍了如何利用OpenTelemetry监控LangChain应用的方法。通过LangSmith机制实现数据采集,只需设置环境变量即可将监控数据发送到OpenTelemetry服务器。文章详细说明了使用Docker部署Grafana LGTM作为监控后端的步骤,包括端口配置和启动方式。同时提供了Python环境的配置指南,展示如何通过Dashboard查看应用的吞吐量、延迟时间和调用链追踪。特别演示了如何分析LangGraph Agent的复杂调用关系,通过Trace功能可清晰了解内部各节点的执行耗时和逻辑顺序,为性能优化提供直观依据。整个方案实现了对LangChain应用的全方位监控,从指标数据到调用链追踪的完整可视化。
openEuler 是由开放原子开源基金会孵化的全场景开源操作系统项目,面向数字基础设施四大核心场景(服务器、云计算、边缘计算、嵌入式),全面支持 ARM、x86、RISC-V、loongArch、PowerPC、SW-64 等多样性计算架构
更多推荐



所有评论(0)