上一篇文章中用到 Jmeter 对http接口进行压力测试，虽然有一些优化但是效果不是很好，所以打算用go 的 pprof 做一个动态追踪，看看到底哪块代码比较耗性能和内存以便于进一步优化。

大杀器 PProf

pprof 是用于可视化和分析性能分析数据的工具

pprof 以 profile.proto 读取分析样本的集合，并生成报告以可视化并帮助分析数据（支持文本和图形报告）

profile.proto 是一个 Protocol Buffer v3 的描述文件，它描述了一组 callstack 和 symbolization 信息，作用是表示统计分析的一组采样的调用栈，是很常见的 stacktrace 配置文件格式

支持的使用模式：

Report generation：报告生成
Interactive terminal use：交互式终端使用
Web interface：Web 界面

本文主要是命令行配合web界面便于观看。

监控项目：

CPU Profiling：CPU 分析，按照一定的频率采集所监听的应用程序 CPU（含寄存器）的使用情况，可确定应用程序在主动消耗 CPU 周期时花费时间的位置
Memory Profiling：内存分析，在应用程序进行堆分配时记录堆栈跟踪，用于监视当前和历史内存使用情况，以及检查内存泄漏
Block Profiling：阻塞分析，记录 goroutine 阻塞等待同步（包括定时器通道）的位置
Mutex Profiling：互斥锁分析，报告互斥锁的竞争情况

pprof 实例

通常我们采用http api来将pprof信息暴露出来以供分析，我们可以采用net/http/pprof 这个package。下面是一个简单的示例：

// pprof 的init函数会将pprof里的一些handler注册到http.DefaultServeMux上
// 当不使用http.DefaultServeMux来提供http api时，可以查阅其init函数，自己注册handler
import _ "net/http/pprof"

go func() {
    http.ListenAndServe("0.0.0.0:8080", nil)
}()

其实就只在你的引入 net/http 地方再引入一个 “net/http/pprof” 。

然后请打开你的浏览器访问地址：http://localhost:8080/debug/pprof/

可以看见如下如的信息页面显示：

分析：

一、通过 Web 界面

访问 http://localhost:8080/debug/pprof/

这个页面中有许多子页面，来看看都有些什么？

cpu（CPU Profiling）: $HOST/debug/pprof/profile，默认进行 30s 的 CPU Profiling，得到一个分析用的 profile 文件，这是个不方便直接读的二进制文件。需要用go tool 工具进行分析，下文会有提到。
block（Block Profiling）：$HOST/debug/pprof/block，查看导致阻塞同步的堆栈跟踪
goroutine：$HOST/debug/pprof/goroutine，查看当前所有运行的 goroutines 堆栈跟踪
heap（Memory Profiling）: $HOST/debug/pprof/heap，查看活动对象的内存分配情况
mutex（Mutex Profiling）：$HOST/debug/pprof/mutex，查看导致互斥锁的竞争持有者的堆栈跟踪
threadcreate：$HOST/debug/pprof/threadcreate，查看创建新OS线程的堆栈跟踪

二、通过交互式终端使用

对于cpu的追踪： CPU Profiling

（1）go tool pprof http://localhost:8080/debug/pprof/profile?seconds=60

$ go tool pprof http://localhost:8080/debug/pprof/profile\?seconds\=60

Fetching profile over HTTP from http://localhost:6060/debug/pprof/profile?seconds=60
Saved profile in /Users/eddycjy/pprof/pprof.samples.cpu.007.pb.gz
Type: cpu
Duration: 1mins, Total samples = 26.55s (44.15%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof)

执行该命令后，需等待 60 秒（可调整 seconds 的值），pprof 会进行 CPU Profiling。结束后将默认进入 pprof 的交互式命令模式，可以对分析的结果进行查看或导出。具体可执行 pprof help 查看命令说明

你可以输入top10 查看：

(pprof) top10
Showing nodes accounting for 25.92s, 97.63% of 26.55s total
Dropped 85 nodes (cum <= 0.13s)
Showing top 10 nodes out of 21
      flat  flat%   sum%        cum   cum%
    23.28s 87.68% 87.68%     23.29s 87.72%  syscall.Syscall
     0.77s  2.90% 90.58%      0.77s  2.90%  runtime.memmove
     0.58s  2.18% 92.77%      0.58s  2.18%  runtime.freedefer
     0.53s  2.00% 94.76%      1.42s  5.35%  runtime.scanobject
     0.36s  1.36% 96.12%      0.39s  1.47%  runtime.heapBitsForObject
     0.35s  1.32% 97.44%      0.45s  1.69%  runtime.greyobject
     0.02s 0.075% 97.51%     24.96s 94.01%  main.main.func1
     0.01s 0.038% 97.55%     23.91s 90.06%  os.(*File).Write
     0.01s 0.038% 97.59%      0.19s  0.72%  runtime.mallocgc
     0.01s 0.038% 97.63%     23.30s 87.76%  syscall.Write

flat：给定函数上运行耗时
flat%：同上的 CPU 运行耗时总比例
sum%：给定函数累积使用 CPU 总比例
cum：当前函数加上它之上的调用运行总耗时
cum%：同上的 CPU 运行耗时总比例

对于heap内存追踪：

（2）go tool pprof http://localhost:6060/debug/pprof/heap

$ go tool pprof http://localhost:6060/debug/pprof/heap
Fetching profile over HTTP from http://localhost:6060/debug/pprof/heap
Saved profile in /Users/eddycjy/pprof/pprof.alloc_objects.alloc_space.inuse_objects.inuse_space.008.pb.gz
Type: inuse_space
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 837.48MB, 100% of 837.48MB total
      flat  flat%   sum%        cum   cum%
  837.48MB   100%   100%   837.48MB   100%  main.main.func1

接下来的两个自行实验吧：

（3） go tool pprof http://localhost:6060/debug/pprof/block

（4） go tool pprof http://localhost:6060/debug/pprof/mutex

（本次主要测试）针对某些接口使用 web 界面动态追踪利用包 github.com/google/pprof （包括火焰图）

以上都是直接对启动 main.go 主程序监控端口启动服务的追踪，所以并没有调用什么接口中的业务代码，那如果想把追踪粒度控到某一个接口，就应该写一个 benchmark 导出 .out 文件进行分析。

代码如下：

func BenchmarkDevice_GetByApp(b *testing.B) {
	r := Device{
		Keyspace: app.Config.Database.Keyspace,
		Session:  session,
	}
	id := "dd4ac8da-552e-4f8c-8d0e-18cd874a1185"
	for i := 0; i < b.N; i++ {
		_, _ = r.GetByApp(id, true, "", 1, 2, []string{"devicePath,asc", "deviceName,asc", "name,asc"})
	}
}

主要是对函数 GetByApp 进行基准测试。

终端输入命令如下：

 go test -bench=GetByApp -benchtime=10s  -memprofile mem.out -cpuprofile cpu.out

因为该接口较为耗时所以把时间延迟到10s的标准进行较为精准的测试。

内存分析导入了mem.out文件
cpu分析导入了cpu.out文件中

最后导入包：

$ go get -u github.com/google/pprof

启动追踪页面：

pprof -http=:8080 mem.out

或

 pprof -http=:8080 cpu.out

你就会看见如下的界面：

这个是流程线路图

也包括火焰图。颜色越深表示越耗资源。

这里特别说一下这四个参数选择对比：

go tool pprof 可以选择显示分配计数（allocation counts ）或使用内存（in use memory）。

如果你想关心内存量的使用情况，那么你应该选择的是用 inuse metrics
如果你担心垃圾回收gc 的时间（即堆上分配的内存需要gc花时间来回收），请查看 allocations

-inuse_space ： Display in-use memory size

-inuse_objects ： Display in-use object counts

-alloc_space ： Display allocated memory size

-alloc_objects ： Display allocated object counts

其中待 objects 表示的是分配次数，而 space 表示具体的内存量。

参考文献：

golang 性能优化