1. 前言

LangChain(含LangGraph)是在AI应用开发中被广泛使用的框架之一。这类应用渐渐成为企业应用系统的重要部分。OpenTelemetry作为企业应用系统监控的整体解决方案,肯定要纳入这部分关键应用。当然实现起来也非常简单,我这里就简单描述一下基本的做法。

2. 借助LangSmith的机制来实现OpenTelemetry对LangChain应用的监控

我曾经写过一篇使用OpenTelemetry监控Python应用的文章,原则上是可以用的,但是不巧的是它对LangChain的实现有bug还没有修复。实际上,就像Java的Spring Framework一样,Python的LangChain框架自身就对OpenTelemetry有良好的支持,这个支持是通过其LangSmith来实现的。我们只要激活LangSmith的机制,也不必用LangSmith的服务器,可以把数据直接发送到OpenTelemetry的服务器上。

下面我把实现的步骤讲一下:

3. 启动OpenTelemetry的后端工具

OpenTelemetry的后端工具,就是支持OpenTelemetry metrics/traces/logs/profiles的数据库和UI的工具集。最常见的做法还是使用OpenTelemetry Collector来连接不同的后端工具。假如您是初学者,或者系统很小,可以直接使用基于Docker的Grafana LGTM,几乎是一键安装完成,非常简单易行。前提是需要您有个支持Docker的环境。

假设您想把LGTM安装到/opt/lgtm目录 (任何目录均可),下面是命令(假设在Linux系统):

docker pull grafana/otel-lgtm

mkdir /opt/lgtm
cd /opt/lgtm
wget https://raw.githubusercontent.com/grafana/docker-otel-lgtm/main/run-lgtm.sh
chmod +x run-lgtm.sh
sed -i 's/3000:3000/3100:3000/' run-lgtm.sh

注意最后一行命令,因为LGTM的Grafana的默认对外端口是3000,这个端口经常和一些应用程序冲突,我就改成了3100.

下面是启动LGTM的方法,就一个命令:

cd /opt/lgtm
./run-lgtm.sh

LGTM要监听以下的端口:

  • 4317/4318 是OTLP端口,用来接受metrics/traces/logs数据
  • 3100 是Grafana UI的端口
  • 9090 是Prometheus的端口,用于调试
  • 4040 是Pyroscope接受Profiles的端口,将来会整合进入4317/4318

 4.  如何激活LangChain应用发送数据给OpenTelemetry

1) 增加Python包:langsmith

一般先要进到应用使用的虚拟环境(venv, Conda之类的)。

然后可以简单地使用如下的命令安装Python包:

pip install -U langsmith

当然更正规的方式是在应用的requirements.txt里面增加两行:

langsmith

然后重新执行:

pip install -r requirements.txt
2) 在启动LangChain应用前设置必要的环境变量

假如您启动应用的命令是:

python app.py

那可以改成如下命令来启动应用:

export LANGSMITH_TRACING=true
export LANGSMITH_OTEL_ENABLED=true
export LANGSMITH_OTEL_ONLY=true
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
export LS_APM_OTEL_ENABLED=true
export OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf

export OTEL_SERVICE_NAME=MY_SERVICE1
python app.py

注意:

  • 按照您自己应用的需要来修改OTEL_SERVICE_NAME环境变量
  • 如果OpenTelemetry后端(本例采用LGTM)不在本机,假如在1.2.3.4,则设置响应的OTEL_EXPORTER_OTLP_ENDPOINT为
export OTEL_EXPORTER_OTLP_ENDPOINT=http://1.2.3.4:4318

5. 监控LangChain应用的Dashboard

使用“http://localhost:3100”就可以访问LGTM的Grafana UI界面(如果是远程的话,用主机名或者IP替换localhost),使用admin/admin登录。虽然LGTM支持各种类型的数据:metrics/traces/logs/profiles,但是由于OpenTelemetry对Python的自动监控目前只支持traces,而我们可以通过traces计算出一些metrics,所以我们的Grafana Dashboard只能有traces和metrics这两种数据。

以下是我做的Dashboard的截图:

我简单介绍一下这个Dashboard:

第一行的两个图都是展现吞吐量:左图是基于每秒请求数,右图是基于每秒的字节数。

左图用到的PromQL是:

sum by (service, span_name, status_code) (
  rate(traces_spanmetrics_calls_total[2m])
)

右图用到的PromQL是:

sum by (service, span_name, status_code) (rate(traces_spanmetrics_size_total[2m]))

第二行展现的是延迟时间,用到的PromQL是:

histogram_quantile(0.90, sum by (le, service, span_name, status_code) (rate(traces_spanmetrics_latency_bucket[2m])))

第三行展现的是Traces的列表。Trace的分析有各种做法,我举一个例子。比如基于LangGraph做的Agent,内部的调用比较复杂,很多是自动判断来完成的。我们要进行性能分析的时候,就需要看清楚调用的逻辑顺序和个步骤的消耗,这时候Trace就特别有价值。

现在我们先选择查询我们这些应用中从Agent发送来的Traces。点击Trace图右上角的三个小点,选择“Edit”进入设置,然后选择适当的查询方法。假如我们的某类Agent的服务名称是“agent1”,就在服务名称中选择或者直接输入“agent1”。点击“Refresh”按键,得到在选择的时间段内所有“agent1”的Traces,如下图所示:

点积其中一个Trace,如下图,可以看出内部的调用关系和时序。这个Agent还算简单,从根agent1(root span)生出两个子span(“retrieval”和“chatbot”),分别对用两个LangGraph节点。“retrieval”节点下又分出三个子span(VectorStoreRetriever,ChatPromptTemplate,和route_after_retrieval),从代码可以看到是三个调用。而“chatbot”下只分出一个子span,叫做“ChatOllama”。

我们点积耗时最长的叶子span - “ChatOllama”。可以看到这个Span的详细信息,如下图,包括模型名称,token,输入与回答等各种信息。

下面是Dashboard的源码,仅供参考:

{
  "apiVersion": "dashboard.grafana.app/v2",
  "kind": "Dashboard",
  "metadata": {
    "name": "adkqq5s",
    "namespace": "default",
    "uid": "8e305963-7a7d-4d83-b256-e40d9b932f12",
    "resourceVersion": "1777469615660990",
    "generation": 11,
    "creationTimestamp": "2026-04-18T15:09:59Z",
    "labels": {
      "grafana.app/deprecatedInternalID": "885991689437184"
    },
    "annotations": {
      "grafana.app/createdBy": "user:eferfp17srfnkb",
      "grafana.app/folder": "",
      "grafana.app/saved-from-ui": "Grafana v13.0.1 (a100054f)",
      "grafana.app/updatedBy": "user:eferfp17srfnkb",
      "grafana.app/updatedTimestamp": "2026-04-29T13:33:35Z"
    }
  },
  "spec": {
    "annotations": [
      {
        "kind": "AnnotationQuery",
        "spec": {
          "query": {
            "kind": "DataQuery",
            "group": "grafana",
            "version": "v0",
            "datasource": {
              "name": "-- Grafana --"
            },
            "spec": {}
          },
          "enable": true,
          "hide": true,
          "iconColor": "rgba(0, 211, 255, 1)",
          "name": "Annotations & Alerts",
          "builtIn": true
        }
      }
    ],
    "cursorSync": "Off",
    "editable": true,
    "elements": {
      "panel-1": {
        "kind": "Panel",
        "spec": {
          "id": 1,
          "title": "吞吐量(req/s)",
          "description": "",
          "links": [],
          "data": {
            "kind": "QueryGroup",
            "spec": {
              "queries": [
                {
                  "kind": "PanelQuery",
                  "spec": {
                    "query": {
                      "kind": "DataQuery",
                      "group": "prometheus",
                      "version": "v0",
                      "datasource": {
                        "name": "prometheus"
                      },
                      "spec": {
                        "editorMode": "code",
                        "expr": "sum by (service, span_name, status_code) (\r\n  rate(traces_spanmetrics_calls_total[2m])\r\n)",
                        "instant": true,
                        "legendFormat": "__auto",
                        "range": true
                      }
                    },
                    "refId": "A",
                    "hidden": false
                  }
                }
              ],
              "transformations": [],
              "queryOptions": {}
            }
          },
          "vizConfig": {
            "kind": "VizConfig",
            "group": "timeseries",
            "version": "13.0.1",
            "spec": {
              "options": {
                "annotations": {
                  "clustering": -1,
                  "multiLane": false
                },
                "legend": {
                  "calcs": [],
                  "displayMode": "list",
                  "placement": "bottom",
                  "showLegend": true
                },
                "tooltip": {
                  "hideZeros": false,
                  "mode": "single",
                  "sort": "none"
                }
              },
              "fieldConfig": {
                "defaults": {
                  "unit": "reqps",
                  "thresholds": {
                    "mode": "absolute",
                    "steps": [
                      {
                        "value": 0,
                        "color": "green"
                      },
                      {
                        "value": 80,
                        "color": "red"
                      }
                    ]
                  },
                  "color": {
                    "mode": "palette-classic"
                  },
                  "custom": {
                    "axisBorderShow": false,
                    "axisCenteredZero": false,
                    "axisColorMode": "text",
                    "axisLabel": "",
                    "axisPlacement": "auto",
                    "barAlignment": 0,
                    "barWidthFactor": 0.6,
                    "drawStyle": "line",
                    "fillOpacity": 0,
                    "gradientMode": "none",
                    "hideFrom": {
                      "legend": false,
                      "tooltip": false,
                      "viz": false
                    },
                    "insertNulls": false,
                    "lineInterpolation": "linear",
                    "lineWidth": 1,
                    "pointSize": 5,
                    "scaleDistribution": {
                      "type": "linear"
                    },
                    "showPoints": "never",
                    "showValues": false,
                    "spanNulls": false,
                    "stacking": {
                      "group": "A",
                      "mode": "none"
                    },
                    "thresholdsStyle": {
                      "mode": "off"
                    }
                  }
                },
                "overrides": []
              }
            }
          }
        }
      },
      "panel-2": {
        "kind": "Panel",
        "spec": {
          "id": 2,
          "title": "延迟时间(s)",
          "description": "",
          "links": [],
          "data": {
            "kind": "QueryGroup",
            "spec": {
              "queries": [
                {
                  "kind": "PanelQuery",
                  "spec": {
                    "query": {
                      "kind": "DataQuery",
                      "group": "prometheus",
                      "version": "v0",
                      "datasource": {
                        "name": "prometheus"
                      },
                      "spec": {
                        "editorMode": "code",
                        "expr": "histogram_quantile(0.90, sum by (le, service, span_name, status_code) (rate(traces_spanmetrics_latency_bucket[2m])))",
                        "instant": true,
                        "legendFormat": "__auto",
                        "range": true
                      }
                    },
                    "refId": "A",
                    "hidden": false
                  }
                }
              ],
              "transformations": [],
              "queryOptions": {}
            }
          },
          "vizConfig": {
            "kind": "VizConfig",
            "group": "timeseries",
            "version": "13.0.1",
            "spec": {
              "options": {
                "annotations": {
                  "clustering": -1,
                  "multiLane": false
                },
                "legend": {
                  "calcs": [],
                  "displayMode": "list",
                  "placement": "bottom",
                  "showLegend": true
                },
                "tooltip": {
                  "hideZeros": false,
                  "mode": "single",
                  "sort": "none"
                }
              },
              "fieldConfig": {
                "defaults": {
                  "unit": "s",
                  "thresholds": {
                    "mode": "absolute",
                    "steps": [
                      {
                        "value": 0,
                        "color": "green"
                      },
                      {
                        "value": 80,
                        "color": "red"
                      }
                    ]
                  },
                  "color": {
                    "mode": "palette-classic"
                  },
                  "custom": {
                    "axisBorderShow": false,
                    "axisCenteredZero": false,
                    "axisColorMode": "text",
                    "axisLabel": "",
                    "axisPlacement": "auto",
                    "barAlignment": 0,
                    "barWidthFactor": 0.6,
                    "drawStyle": "line",
                    "fillOpacity": 0,
                    "gradientMode": "none",
                    "hideFrom": {
                      "legend": false,
                      "tooltip": false,
                      "viz": false
                    },
                    "insertNulls": false,
                    "lineInterpolation": "linear",
                    "lineWidth": 1,
                    "pointSize": 5,
                    "scaleDistribution": {
                      "type": "linear"
                    },
                    "showPoints": "never",
                    "showValues": false,
                    "spanNulls": false,
                    "stacking": {
                      "group": "A",
                      "mode": "none"
                    },
                    "thresholdsStyle": {
                      "mode": "off"
                    }
                  }
                },
                "overrides": []
              }
            }
          }
        }
      },
      "panel-3": {
        "kind": "Panel",
        "spec": {
          "id": 3,
          "title": "吞吐量(bytes/sec)",
          "description": "",
          "links": [],
          "data": {
            "kind": "QueryGroup",
            "spec": {
              "queries": [
                {
                  "kind": "PanelQuery",
                  "spec": {
                    "query": {
                      "kind": "DataQuery",
                      "group": "prometheus",
                      "version": "v0",
                      "datasource": {
                        "name": "prometheus"
                      },
                      "spec": {
                        "editorMode": "code",
                        "expr": "sum by (service, span_name, status_code) (rate(traces_spanmetrics_size_total[2m]))",
                        "instant": true,
                        "legendFormat": "__auto",
                        "range": true
                      }
                    },
                    "refId": "A",
                    "hidden": false
                  }
                }
              ],
              "transformations": [],
              "queryOptions": {}
            }
          },
          "vizConfig": {
            "kind": "VizConfig",
            "group": "timeseries",
            "version": "13.0.1",
            "spec": {
              "options": {
                "annotations": {
                  "clustering": -1,
                  "multiLane": false
                },
                "legend": {
                  "calcs": [],
                  "displayMode": "list",
                  "placement": "bottom",
                  "showLegend": true
                },
                "tooltip": {
                  "hideZeros": false,
                  "mode": "single",
                  "sort": "none"
                }
              },
              "fieldConfig": {
                "defaults": {
                  "unit": "Bps",
                  "thresholds": {
                    "mode": "absolute",
                    "steps": [
                      {
                        "value": 0,
                        "color": "green"
                      },
                      {
                        "value": 80,
                        "color": "red"
                      }
                    ]
                  },
                  "color": {
                    "mode": "palette-classic"
                  },
                  "custom": {
                    "axisBorderShow": false,
                    "axisCenteredZero": false,
                    "axisColorMode": "text",
                    "axisLabel": "",
                    "axisPlacement": "auto",
                    "barAlignment": 0,
                    "barWidthFactor": 0.6,
                    "drawStyle": "line",
                    "fillOpacity": 0,
                    "gradientMode": "none",
                    "hideFrom": {
                      "legend": false,
                      "tooltip": false,
                      "viz": false
                    },
                    "insertNulls": false,
                    "lineInterpolation": "linear",
                    "lineWidth": 1,
                    "pointSize": 5,
                    "scaleDistribution": {
                      "type": "linear"
                    },
                    "showPoints": "never",
                    "showValues": false,
                    "spanNulls": false,
                    "stacking": {
                      "group": "A",
                      "mode": "none"
                    },
                    "thresholdsStyle": {
                      "mode": "off"
                    }
                  }
                },
                "overrides": []
              }
            }
          }
        }
      },
      "panel-4": {
        "kind": "Panel",
        "spec": {
          "id": 4,
          "title": "Traces",
          "description": "",
          "links": [],
          "data": {
            "kind": "QueryGroup",
            "spec": {
              "queries": [
                {
                  "kind": "PanelQuery",
                  "spec": {
                    "query": {
                      "kind": "DataQuery",
                      "group": "tempo",
                      "version": "v0",
                      "datasource": {
                        "name": "tempo"
                      },
                      "spec": {
                        "filters": [
                          {
                            "id": "b664759c",
                            "operator": "=",
                            "scope": "span"
                          }
                        ],
                        "key": "Q-9f6990d1-f1a3-4fe9-a46f-63846a4c475e-0",
                        "limit": 30,
                        "metricsQueryType": "range",
                        "queryType": "traceqlSearch",
                        "serviceMapUseNativeHistograms": false,
                        "spss": 3,
                        "tableType": "traces"
                      }
                    },
                    "refId": "A",
                    "hidden": false
                  }
                }
              ],
              "transformations": [],
              "queryOptions": {}
            }
          },
          "vizConfig": {
            "kind": "VizConfig",
            "group": "table",
            "version": "13.0.1",
            "spec": {
              "options": {
                "cellHeight": "sm",
                "showHeader": true,
                "sortBy": [
                  {
                    "desc": true,
                    "displayName": "Service"
                  }
                ]
              },
              "fieldConfig": {
                "defaults": {
                  "thresholds": {
                    "mode": "absolute",
                    "steps": [
                      {
                        "value": 0,
                        "color": "green"
                      },
                      {
                        "value": 80,
                        "color": "red"
                      }
                    ]
                  },
                  "custom": {
                    "align": "auto",
                    "cellOptions": {
                      "type": "auto"
                    },
                    "footer": {
                      "reducers": []
                    },
                    "inspect": false
                  }
                },
                "overrides": []
              }
            }
          }
        }
      }
    },
    "layout": {
      "kind": "GridLayout",
      "spec": {
        "items": [
          {
            "kind": "GridLayoutItem",
            "spec": {
              "x": 0,
              "y": 0,
              "width": 12,
              "height": 8,
              "element": {
                "kind": "ElementReference",
                "name": "panel-1"
              }
            }
          },
          {
            "kind": "GridLayoutItem",
            "spec": {
              "x": 12,
              "y": 0,
              "width": 12,
              "height": 8,
              "element": {
                "kind": "ElementReference",
                "name": "panel-3"
              }
            }
          },
          {
            "kind": "GridLayoutItem",
            "spec": {
              "x": 0,
              "y": 8,
              "width": 24,
              "height": 8,
              "element": {
                "kind": "ElementReference",
                "name": "panel-2"
              }
            }
          },
          {
            "kind": "GridLayoutItem",
            "spec": {
              "x": 0,
              "y": 16,
              "width": 24,
              "height": 7,
              "element": {
                "kind": "ElementReference",
                "name": "panel-4"
              }
            }
          }
        ]
      }
    },
    "links": [],
    "liveNow": false,
    "preload": false,
    "tags": [],
    "timeSettings": {
      "timezone": "browser",
      "from": "now-30m",
      "to": "now",
      "autoRefresh": "",
      "autoRefreshIntervals": [
        "5s",
        "10s",
        "30s",
        "1m",
        "5m",
        "15m",
        "30m",
        "1h",
        "2h",
        "1d"
      ],
      "hideTimepicker": false,
      "fiscalYearStartMonth": 0
    },
    "title": "LangChain1",
    "variables": []
  }
}

6. 总结

本文介绍了如何利用OpenTelemetry监控LangChain应用的方法。通过LangSmith机制实现数据采集,只需设置环境变量即可将监控数据发送到OpenTelemetry服务器。文章详细说明了使用Docker部署Grafana LGTM作为监控后端的步骤,包括端口配置和启动方式。同时提供了Python环境的配置指南,展示如何通过Dashboard查看应用的吞吐量、延迟时间和调用链追踪。特别演示了如何分析LangGraph Agent的复杂调用关系,通过Trace功能可清晰了解内部各节点的执行耗时和逻辑顺序,为性能优化提供直观依据。整个方案实现了对LangChain应用的全方位监控,从指标数据到调用链追踪的完整可视化。

Logo

openEuler 是由开放原子开源基金会孵化的全场景开源操作系统项目,面向数字基础设施四大核心场景(服务器、云计算、边缘计算、嵌入式),全面支持 ARM、x86、RISC-V、loongArch、PowerPC、SW-64 等多样性计算架构

更多推荐