图书简介:
第1章 软件开发大数据分析概述········································································1
1.1 问题背景····························································································1
1.1.1 快速交付与可信保障的双重压力······················································1
1.1.2 软件的复杂性··············································································2
1.1.3 问题与挑战·················································································2
1.2 软件开发的数字化和智能化····································································4
1.3 软件分析基础······················································································8
1.4 代码与设计质量分析············································································11
1.4.1 软件缺陷分析·············································································12
1.4.2 软件设计质量分析·······································································13
1.5 软件开发质量追溯体系·········································································14
1.6 小结·································································································16
参考文献··································································································17
第2章 软件分析技术基础···············································································18
2.1 静态程序分析·····················································································18
2.1.1 程序的表示················································································18
2.1.2 程序语义···················································································21
2.1.3 静态程序分析简介·······································································23
2.2 动态程序分析·····················································································25
2.2.1 Tracing和Profiling······································································25
2.2.2 动态切片···················································································28
2.2.3 动态符号执行·············································································28
2.3 代码克隆检测·····················································································30
2.3.1 代码克隆相关概念·······································································30
2.3.2 代码克隆检测的总体流程······························································32
2.3.3 代码克隆检测方法·······································································33
2.4 代码差异分析·····················································································37
2.4.1 基于文本的差异分析····································································37
2.4.2 基于语法树的差异分析·································································38
2.5 代码演化分析·····················································································41
2.5.1 代码共变的识别··········································································41
2.5.2 代码共变识别的挑战及解决方案·····················································43
2.5.3 代码共变分析的应用····································································43
2.6 小结·································································································44
参考文献··································································································45
第3章 软件缺陷分析·····················································································49
3.1 基于静态分析的缺陷分析······································································49
3.1.1 程序的状态和语义·······································································49
3.1.2 抽象解释理论·············································································51
3.1.3 基于抽象解释的程序分析······························································58
3.1.4 案例分析···················································································59
3.2 基于深度学习的缺陷分析······································································67
3.2.1 基于代码快照表示学习的缺陷检测··················································67
3.2.2 基于代码变更表示学习的缺陷检测··················································69
3.3 基于大模型的缺陷分析···································································70
3.3.1 基于大模型的缺陷定位·································································70
3.3.2 基于大模型的缺陷检测·································································71
3.3.3 基于大模型的缺陷修复·································································72
3.4 缺陷案例挖掘与分析············································································76
3.4.1 缺陷库概述················································································76
3.4.2 基于自动化测试的缺陷案例收集·····················································77
3.4.3 缺陷案例分类与利用····································································80
3.5 小结·································································································82
参考文献··································································································82
第4章 软件设计质量分析···············································································85
4.1 概述·································································································85
4.1.1 软件设计质量分析的目标······························································85
4.1.2 软件设计质量分析的方式······························································86
4.2 模块级软件设计质量分析······································································87
4.2.1 代码内聚度················································································87
4.2.2 代码耦合度················································································88
4.2.3 代码复杂度················································································89
4.2.4 代码重复度················································································89
4.3 架构级软件设计质量分析······································································90
4.3.1 架构复杂度分析··········································································90
4.3.2 模块化······················································································91
4.3.3 层次结构分析·············································································93
4.3.4 软件实现与架构设计的一致性分析··················································95
4.3.5 基于变更传播的设计质量分析························································97
4.4 软件设计异味检测···············································································98
4.4.1 设计异味概述·············································································98
4.4.2 典型的设计异味········································································100
4.5 微服务架构设计质量分析····································································102
4.5.1 单体应用拆分方法·····································································102
4.5.2 微服务架构反模式·····································································103
4.6 小结·······························································································104
参考文献································································································105
第5章 代码克隆分析与管理···········································································107
5.1 概述·······························································································107
5.2 代码克隆分析技术·············································································108
5.2.1 代码克隆量与代码克隆比例分析···················································108
5.2.2 代码克隆谱系分析·····································································110
5.2.3 代码克隆一致性修改分析····························································111
5.2.4 代码克隆与缺陷的关系分析·························································113
5.2.5 代码克隆危害分析·····································································114
5.3 代码克隆管理技术·············································································116
5.3.1 代码克隆的监控········································································116
5.3.2 基于危害分析的代码克隆管理······················································118
5.3.3 代码克隆一致性维护的推荐·························································119
5.3.4 面向软件维护的代码克隆可视化···················································120
5.4 小结·······························································································122
参考文献································································································123
第6章 软件供应链风险分析···········································································124
6.1 软件供应链风险概述··········································································124
6.2 软件成分分析···················································································126
6.2.1 基于包管理器的成分分析····························································126
6.2.2 基于代码指纹的成分分析····························································129
6.3 安全风险分析···················································································131
6.3.1 漏洞数据库构建········································································132
6.3.2 漏洞传播影响分析·····································································137
6.3.3 恶意软件包检测········································································141
6.4 许可证风险分析················································································144
6.4.1 许可证冲突检测········································································144
6.4.2 许可证违反检测········································································145
6.5 维护风险分析···················································································147
6.5.1 组件冲突检测···········································································147
6.5.2 组件版本统一分析·····································································148
6.5.3 组件版本升级推荐·····································································149
6.6 小结·······························································································150
参考文献································································································150
第7章 代码质量与开发效能分析·····································································153
7.1 概述·······························································································153
7.1.1 软件开发质量和效能分析的挑战···················································153
7.1.2 面向质量和效能的代码大数据分析思路··········································154
7.2 代码质量分析···················································································156
7.2.1 概述·······················································································156
7.2.2 静态缺陷·················································································157
7.2.3 代码克隆·················································································160
7.2.4 代码复杂度··············································································163
7.2.5 第三方库·················································································165
7.2.6 代码质量的可视化·····································································166
7.3 开发效能分析···················································································173
7.3.1 概述·······················································································174
7.3.2 开发效能分析的指标体系····························································174
7.3.3 代码贡献·················································································175
7.3.4 代码损耗·················································································175
7.3.5 开发任务与代码的关联分析·························································176
7.4 小结·······························································································178
参考文献································································································178
第8章 代码资产挖掘与推荐···········································································180
8.1 概述·······························································································180
8.2 基于克隆分析的代码资产抽取······························································181
8.2.1 抽取通用功能实现·····································································181
8.2.2 抽取软件设计模板·····································································184
8.3 搜索式代码推荐················································································185
8.3.1 基于信息检索的自然语言代码搜索················································185
8.3.2 基于深度学习的自然语言代码搜索················································194
8.3.3 基于相似性的代码到代码搜索······················································197
8.4 生成式代码推荐················································································199
8.4.1 生成式代码推荐技术思路····························································199
8.4.2 基于传统方法的生成式代码推荐技术进展·······································202
8.4.3 基于大模型的生成式代码推荐技术进展··········································203
8.5 小结·······························································································205
参考文献································································································206
展开
前 言
“软件定义一切”的发展趋势正日益成为现实。在此背景下,来自各行各业及各种层次、各种形态的软件需求急剧增长,并对软件开发的效率和质量提出了很高的要求。一方面,新的软件应用领域和软件产品不断拓展,同时已有的软件产品随着环境和需求的变化而快速变化,快速交付及持续更新已经成为许多软件产品必须面对的现实要求。另一方面,软件作为一种新型的信息化基础设施,其可信性要求越来越高。软件开发组织和开发团队需要兼顾这两方面的目标,即在满足快速迭代和快速交付软件产品及其更新的同时确保软件的可信性。
然而,软件及其开发过程的高度复杂性和不可见性导致当前软件开发与演化面临诸多的问题和挑战。个人能力、经验及主观能动性始终在软件开发中起着决定性作用,企业缺乏有效的管理手段来确保软件开发的效率和质量。应对这些问题和挑战的一种可行途径是建立在软件开发大数据分析之上的软件开发的数字化和智能化提升,而机会主要来自软件开发的流水线化和云化及软件分析与人工智能技术的发展。软件开发的流水线化和云化使得我们可以利用各种程序分析技术从软件开发过程及软件制品中抽取丰富的数据,从而为软件开发的数字化和智能化提升提供数据基础;人工智能技术的发展使得我们可以利用各种数据挖掘、机器学习、深度学习等智能化技术及最近流行的大语言模型高效地从软件开发数据中抽取和学习有用的知识,从而为软件开发的数字化和智能化提升提供技术基础。
复旦大学智能化软件工程与系统研究团队(CodeWisdom团队)围绕软件开发的数字化和智能化这一主题开展了大量研究工作,同时与多家企业形成了长期合作关系,在此基础上积累了丰富的技术和实践经验积累。本书首先在分析现实问题背景的基础上阐述了软件开发的数字化和智能化发展目标并介绍了常用的程序分析技术,然后从软件质量问题分析、软件开发质量追溯体系、代码资产挖掘与代码推荐三方面介绍了软件开发数据分析技术与实践。希望本书的内容能够对企业分析和掌握整体软件研发技术及管理水平现状,并进一步建立规范化的软件研发管理体系与智能化支撑平台有所帮助。
本书由复旦大学CodeWisdom团队编写。其中,彭鑫负责第1章、第2章、第8章的编写及全书统稿;吴毅坚负责第4章、第5章、第7章的编写;陈碧欢负责第3章、第6章的编写。此外,团队中的李弋、董震、娄一翎、刘名威、黄凯锋等几位老师和博士后及王翀、周捷、宋学志等几位博士生和科研助理也参加了部分章节的资料整理工作,为本书的出版做出了巨大的贡献。
感谢电子工业出版社的大力支持及在本书撰写过程中的细心指导,感谢中国计算机学会软件工程专委会、复旦大学计算机科学技术学院的领导和老师们对本书的大力支持,同时还要感谢各家合作企业对我们的支持,我们的很多想法和成果都是在与企业合作伙伴的交流与合作过程中逐渐产生和发展的。
由于作者水平有限,书中难免有错误和疏漏之处,恳请广大读者批评指正!
作者
展开