图书简介:
Preface
Part I Preliminaries
Chapter 1 Data Structures and Algorithms
1.1 A Philosophy of Data Structures
1.1.1 The Need for Data Structures
1.1.2 Costs and Benefits
1.2 Abstract Data Types and Data Structures
1.3 Design Patterns
1.3.1 Flyweight
1.3.2 Visitor
1.3.3 Composite
1.3.4 Strategy
1.4 Problems, Algorithms, and Programs
1.5 Further Reading
1.6 Exercises
Chapter 2 Mathematical Preliminaries
2.1 Sets and Relations
2.2 Miscellaneous Notation
2.3 Logarithms
2.4 Summations and Recurrences
2.5 Recursion
2.6 Mathematical Proof Techniques
2.6.1 Direct Proof
2.6.2 Proof by Contradiction
2.6.3 Proof by Mathematical Induction
2.7 Estimation
2.8 Further Reading
2.9 Exercises
Chapter 3 Algorithm Analysis
3.1 Introduction
3.2 Best, Worst, and Average Cases
3.3 A Faster Computer, or a Faster Algorithm?
3.4 Asymptotic Analysis
3.4.1 Upper Bounds
3.4.2 Lower Bounds
3.4.3 Notation
3.4.4 Simplifying Rules
3.4.5 Classifying Functions
3.5 Calculating the Running Time for a Program
3.6 Analyzing Problems
3.7 Common Misunderstandings
3.8 Multiple Parameters
3.9 Space Bounds
3.10 Speeding Up Your Programs
3.11 Empirical Analysis
3.12 Further Reading
3.13 Exercises
3.14 Projects
Part II Fundamental Data Structures
Chapter 4 Lists, Stacks, and Queues
4.1 Lists
4.1.1 Array-Based List Implementation
4.1.2 Linked Lists
4.1.3 Comparison of List Implementations
4.1.4 Element Implementations
4.1.5 Doubly Linked Lists
4.2 Stacks
4.2.1 Array-Based Stacks
4.2.2 Linked Stacks
4.2.3 Comparison of Array-Based and Linked Stacks
4.2.4 Implementing Recursion
4.3 Queues
4.3.1 Array-Based Queues
4.3.2 Linked Queues
4.3.3 Comparison of Array-Based and Linked Queues
4.4 Dictionaries
4.5 Further Reading
4.6 Exercises
4.7 Projects
Chapter 5 Binary Trees
5.1 Definitions and Properties
5.1.1 The Full Binary Tree Theorem
5.1.2 A Binary Tree Node ADT
5.2 Binary Tree Traversals
5.3 Binary Tree Node Implementations
5.3.1 Pointer-Based Node Implementations
5.3.2 Space Requirements
5.3.3 Array Implementation for Complete Binary Trees
5.4 Binary Search Trees
5.5 Heaps and Priority Queues
5.6 Huffman Coding Trees
5.6.1 Building Huffman Coding Trees
5.6.2 Assigning and Using Huffman Codes
5.6.3 Search in Huffman Trees
5.7 Further Reading
5.8 Exercises
5.9 Projects
Chapter 6 Non-Binary Trees
6.1 General Tree Definitions and Terminology
6.1.1 An ADT for General Tree Nodes
6.1.2 General Tree Traversals
6.2 The Parent Pointer Implementation
6.3 General Tree Implementations
6.3.1 List of Children
6.3.2 The Left-Child/Right-Sibling Implementation
6.3.3 Dynamic Node Implementations
6.3.4 Dynamic “Left-Child/Right-Sibling” Implementation
6.4 K-ary Trees
6.5 Sequential Tree Implementations
6.6 Further Reading
6.7 Exercises
6.8 Projects
Part III Sorting and Searching
Chapter 7 Internal Sorting
7.1 Sorting Terminology and Notation
7.2 Three (n2) Sorting Algorithms
7.2.1 Insertion Sort
7.2.2 Bubble Sort
7.2.3 Selection Sort
7.2.4 The Cost of Exchange Sorting
7.3 Shellsort
7.4 Mergesort
7.5 Quicksort
7.6 Heapsort
7.7 Binsort and Radix Sort
7.8 An Empirical Comparison of Sorting Algorithms
7.9 Lower Bounds for Sorting
7.10 Further Reading
7.11 Exercises
7.12 Projects
Chapter 8 File Processing and External Sorting
8.1 Primary versus Secondary Storage
8.2 Disk Drives
8.2.1 Disk Drive Architecture
8.2.2 Disk Access Costs
8.3 Buffers and Buffer Pools
8.4 The Programmer’s View of Files
8.5 External Sorting
8.5.1 Simple Approaches to External Sorting
8.5.2 Replacement Selection
8.5.3 Multiway Merging
8.6 Further Reading
8.7 Exercises
8.8 Projects
Chapter 9 Searching
9.1 Searching Unsorted and Sorted Arrays
9.2 Self-Organizing Lists
9.3 Bit Vectors for Representing Sets
9.4 Hashing
9.4.1 Hash Functions
9.4.2 Open Hashing
9.4.3 Closed Hashing
9.4.4 Analysis of Closed Hashing
9.4.5 Deletion
9.5 Further Reading
9.6 Exercises
9.7 Projects
Chapter 10 Indexing
10.1 Linear Indexing
10.2 ISAM
10.3 Tree-based Indexing
10.4 2-3 Trees
10.5 B-Trees
10.5.1 B+-Trees
10.5.2 B-Tree Analysis
10.6 Further Reading
10.7 Exercises
10.8 Projects
Part IV Advanced Data Structures
Chapter 11 Graphs
11.1 Terminology and Representations
11.2 Graph Implementations
11.3 Graph Traversals
11.3.1 Depth-First Search
11.3.2 Breadth-First Search
11.3.3 Topological Sort
11.4 Shortest-Paths Problems
11.4.1 Single-Source Shortest Paths
11.5 Minimum-Cost Spanning Trees
11.5.1 Prim’s Algorithm
11.5.2 Kruskal’s Algorithm
11.6 Further Reading
11.7 Exercises
11.8 Projects
Chapter 12 Lists and Arrays Revisited
12.1 Multilists
12.2 Matrix Representations
12.3 Memory Management
12.3.1 Dynamic Storage Allocation
12.3.2 Failure Policies and Garbage Collection
12.4 Further Reading
12.5 Exercises
12.6 Projects
Chapter 13 Advanced Tree Structures
13.1 Tries
13.2 Balanced Trees
13.2.1 The AVL Tree
13.2.2 The Splay Tree
13.3 Spatial Data Structures
13.3.1 The K-D Tree
13.3.2 The PR quadtree
13.3.3 Other Point Data Structures
13.3.4 Other Spatial Data Structures
13.4 Further Reading
13.5 Exercises
13.6 Projects
Part V Theory of Algorithms
14 Analysis Techniques
14.1 Summation Techniques
14.2 Recurrence Relations
14.2.1 Estimating Upper and Lower Bounds
14.2.2 Expanding Recurrences
14.2.3 Divide and Conquer Recurrences
14.2.4 Average-Case Analysis of Quicksort
14.3 Amortized Analysis
14.4 Further Reading
14.5 Exercises
14.6 Projects
Chapter 15 Lower Bounds
15.1 Introduction to Lower Bounds Proofs
15.2 Lower Bounds on Searching Lists
15.2.1 Searching in Unsorted Lists
15.2.2 Searching in Sorted Lists
15.3 Finding the Maximum Value
15.4 Adversarial Lower Bounds Proofs
15.5 State Space Lower Bounds Proofs
15.6 Finding the ith Best Element
15.7 Optimal Sorting
15.8 Further Reading
15.9 Exercises
15.10 Projects
Chapter 16 Patterns of Algorithms
16.1 Dynamic Programming
16.1.1 The Knapsack Problem
16.1.2 All-Pairs Shortest Paths
16.2 Randomized Algorithms
16.2.1 Randomized algorithms for finding large values
16.2.2 Skip Lists
16.3 Numerical Algorithms
16.3.1 Exponentiation
16.3.2 Largest Common Factor
16.3.3 Matrix Multiplication
16.3.4 Random Numbers
16.3.5 The Fast Fourier Transform
16.4 Further Reading
16.5 Exercises
16.6 Projects
Chapter 17 Limits to Computation
17.1 Reductions
17.2 Hard Problems
17.2.1 The Theory of NP-Completeness
17.2.2 NP-Completeness Proofs
17.2.3 Coping with NP-Complete Problems
17.3 Impossible Problems
17.3.1 Uncountability
17.3.2 The Halting Problem Is Unsolvable
17.4 Further Reading
17.5 Exercises
17.6 Projects
Part VI APPENDIX
A Utility Functions
Bibliography
Index
展开
Clifford A. Shaffer教授于美国马里兰大学获计算机科学博士学位,在弗吉尼亚理工大学计算机科学系任教超过25年,具有丰富的教学经验,并参与遗传学、生物信息学和计算生物学交叉项目。著有多本数据结构和算法分析教材。
Preface
We study data structures so that we can learn to write more efficient programs. But why must programs be efficient when new computers are faster every year? The reason is that our ambitions grow with our capabilities. Instead of rendering efficiency needs obsolete, the modern revolution in computing power and storage capability merely raises the efficiency stakes as we attempt more complex tasks.
The quest for program efficiency need not and should not conflict with sound design and clear coding. Creating efficient programs has little to do with “programming tricks” but rather is based on good organization of information and good algorithms. A programmer who has not mastered the basic principles of clear design is not likely to write efficient programs. Conversely, concerns related to development costs and maintainability should not be used as an excuse to justify inefficient performance. Generality in design can and should be achieved without sacrificing performance, but this can only be done if the designer understands how to measure performance and does so as an integral part of the design and implementation process. Most computer science curricula recognize that good programming skills begin with a strong emphasis on fundamental software engineering principles. Then, once a programmer has learned the principles of clear program design and implementation, the next step is to study the effects of data organization and algorithms on program efficiency.
Approach: This book describes many techniques for representing data. These techniques are presented within the context of the following principles:
1. Each data structure and each algorithm has costs and benefits. Practitioners need a thorough understanding of how to assess costs and benefits to be able to adapt to new design challenges. This requires an understanding of the principles of algorithm analysis, and also an appreciation for the significant effects of the physical medium employed (e.g., data stored on disk versus main memory).
2. Related to costs and benefits is the notion of tradeoffs. For example, it is quite common to reduce time requirements at the expense of an increase in space requirements, or vice versa. Programmers face tradeoff issues regularly in all phases of software design and implementation, so the concept must become deeply ingrained.
3. Programmers should know enough about common practice to avoid reinventing the wheel. Thus, programmers need to learn the commonly used data structures, their related algorithms, and the most frequently encountered design patterns found in programming.
4. Data structures follow needs. Programmers must learn to assess application needs first, then find a data structure with matching capabilities. To do this requires competence in Principles 1, 2, and 3.
As I have taught data structures through the years, I have found that design issues have played an ever greater role in my courses. This can be traced through the various editions of this textbook by the increasing coverage for design patterns and generic interfaces. The first edition had no mention of design patterns. The second edition had limited coverage of a few example patterns, and introduced the dictionary ADT and comparator classes. With the third edition, there is explicit coverage of some design patterns that are encountered when programming the basic data structures and algorithms covered in the book.
Using the Book in Class: Data structures and algorithms textbooks tend to fall into one of two categories: teaching texts or encyclopedias. Books that attempt to do both usually fail at both. This book is intended as a teaching text. I believe it is more Within an undergraduate program, this textbook is designed for use in either an advanced lower division (sophomore or junior level) data structures course, or for a senior level algorithms course. New material has been added in the third edition to support its use in an algorithms course. Normally, this text would be used in a course beyond the standard freshman level “CS2” course that often serves as the initial introduction to data structures. Readers of this book should typically have two semesters of the equivalent of programming experience, including at least some exposure to C++. Readers who are already familiar with recursion will have an advantage. Students of data structures will also benefit from having first completed a good course in Discrete Mathematics. Nonetheless, Chapter 2 attempts to give a reasonably complete survey of the prerequisite mathematical topics at the level necessary to understand their use in this book. Readers may wish to refer back to the appropriate sections as needed when encountering unfamiliar mathematical material.
A sophomore-level class where students have only a little background in basic data structures or analysis (that is, background equivalent to what would be had from a traditional CS2 course) might cover Chapters 1-11 in detail, as well as selected topics from Chapter 13. That is how I use the book for my own sophomore level class. Students with greater background might cover Chapter 1, skip most of Chapter 2 except for reference, briefly cover Chapters 3 and 4, and then cover chapters 5-12 in detail. Again, only certain topics from Chapter 13 might be covered, depending on the programming assignments selected by the instructor. A senior-level algorithms course would focus on Chapters 11 and 14-17.
Chapter 13 is intended in part as a source for larger programming exercises. I recommend that all students taking a data structures course be required to implement some advanced tree structure, or another dynamic structure of comparable difficulty such as the skip list or sparse matrix representations of Chapter 12. None of these data structures are significantly more difficult to implement than the binary search tree, and any of them should be within a student’s ability after completing Chapter 5.
While I have attempted to arrange the presentation in an order that makes sense, instructors should feel free to rearrange the topics as they see fit. The book has been written so that once the reader has mastered Chapters 1-6, the remaining material has relatively few dependencies. Clearly, external sorting depends on understanding internal sorting and disk files. Section 6.2 on the UNION/FIND algorithm is used in Kruskal’s Minimum-Cost Spanning Tree algorithm. Section 9.2 on self-organizing lists mentions the buffer replacement schemes covered in Section 8.3.
Chapter 14 draws on examples from throughout the book. Section 17.2 relies on knowledge of graphs. Otherwise, most topics depend only on material presented earlier within the same chapter.
Most chapters end with a section entitled “Further Reading.” These sections are not comprehensive lists of references on the topics presented. Rather, I include books and articles that, in my opinion, may prove exceptionally informative or entertaining to the reader. In some cases I include references to works that should become familiar to any well-rounded computer scientist.
Use of C++: The programming examples are written in C++, but I do not wish to discourage those unfamiliar with C++ from reading this book. I have attempted to make the examples as clear as possible while maintaining the advantages of C++. C++ is used here strictly as a tool to illustrate data structures concepts. In particular, I make use of C++’s support for hiding implementation details, including features such as classes, private class members, constructors, and destructors. These features of the language support the crucial concept of separating logical design, as embodied in the abstract data type, from physical implementation as embodied in the data structure.
To keep the presentation as clear as possible, some important features of C++ are avoided here. I deliberately minimize use of certain features commonly used by experienced C++ programmers such as class hierarchy, inheritance, and virtual functions. Operator and function overloading is used sparingly. C-like initialization syntax is preferred to some of the alternatives offered by C++.
While the C++ features mentioned above have valid design rationale in real programs, they tend to obscure rather than enlighten the principles espoused in this book. For example, inheritance is an important tool that helps programmers avoid duplication, and thus minimize bugs. From a pedagogical standpoint, however, inheritance often makes code examples harder to understand since it tends to spread the description for one logical unit among several classes. Thus, my class definitions only use inheritance where inheritance is explicitly relevant to the point illustrated (e.g., Section 5.3.1). This does not mean that a programmer should do likewise. Avoiding code duplication and minimizing errors are important goals. Treat the programming examples as illustrations of data structure principles, but do not copy them directly into your own programs.
One painful decision I had to make was whether to use templates in the code examples. In the first edition of this book, the decision was to leave templates out as it was felt that their syntax obscures the meaning of the code for those not familiar with C++. In the years following, the use of C++ in computer science curricula has greatly expanded. I now assume that readers of the text will be familiar with template syntax. Thus, templates are now used extensively in the code examples.
My implementations are meant to provide concrete illustrations of data structure principles, as an aid to the textual exposition. Code examples should not be read or used in isolation from the associated text because the bulk of each example’s documentation is contained in the text, not the code. The code complements the text, not the other way around. They are not meant to be a series of commercial quality class implementations. If you are looking for a complete implementation of a standard data structure for use in your own code, you would do well to do an Internet search.
For instance, the code examples provide less parameter checking than is sound programming practice, since including such checking would obscure rather than illuminate the text. Some parameter checking and testing for other constraints (e.g., whether a value is being removed from an empty container) is included in the form of a call to Assert. The inputs to Assert are a Boolean expression and a character string. If this expression evaluates to false, then a message is printed and the program terminates immediately. Terminating a program when a function receives a bad parameter is generally considered undesirable in real programs, but is quite adequate for understanding how a data structure is meant to operate. In real programming applications, C++’s exception handling features should be used to deal with input data errors. However, assertions provide a simpler mechanism for indicating required conditions in a way that is both adequate for clarifying how a data structure is meant to operate, and is easily modified into true exception handling. See the Appendix for the implementation of Assert.
I make a distinction in the text between “C++ implementations” and “pseudocode.”Code labeled as a C++ implementation has actually been compiled and tested on one or more C++ compilers. Pseudocode examples often conform closely to C++ syntax, but typically contain one or more lines of higher-level description. Pseudocode is used where I perceived a greater pedagogical advantage to a simpler, but less precise, description.
Exercises and Projects: Proper implementation and analysis of data structures cannot be learned simply by reading a book. You must practice by implementing real programs, constantly comparing different techniques to see what really works best in a given situation.
One of the most important aspects of a course in data structures is that it is where students really learn to program using pointers and dynamic memory allocation, by implementing data structures such as linked lists and trees. It is often where students truly learn recursion. In our curriculum, this is the first course where students do significant design, because it often requires real data structures to motivate significant design exercises. Finally, the fundamental differences between memory-based and disk-based data access cannot be appreciated without practical programming experience. For all of these reasons, a data structures course cannot succeed without a significant programming component. In our department, the data structures course is one of the most difficult programming course in the curriculum.
Students should also work problems to develop their analytical abilities. I provide over 450 exercises and suggestions for programming projects. I urge readers to take advantage of them.
Contacting the Author and Supplementary Materials: A book such as this is sure to contain errors and have room for improvement. I welcome bug reports and constructive criticism. I can be reached by electronic mail via the Internet at shaffer@vt.edu. Alternatively, comments can be mailed to
Cliff Shaffer
Department of Computer Science
Virginia Tech
Blacksburg, VA 24061
The electronic posting of this book, along with a set of lecture notes for use in class can be obtained at
http://www.cs.vt.edu/?shaffer/book.html.
The code examples used in the book are available at the same site. Online Web pages for Virginia Tech’s sophomore-level data structures class can be found at
http://courses.cs.vt.edu/?cs3114.
This book was typeset by the author using LATEX. The bibliography was prepared using BIBTEX. The index was prepared using makeindex. The figures were mostly drawn with Xfig. Figures 3.1 and 9.10 were partially created using Mathematica.
Acknowledgments: It takes a lot of help from a lot of people to make a book. I wish to acknowledge a few of those who helped to make this book possible. I apologize for the inevitable omissions.
Virginia Tech helped make this whole thing possible through sabbatical research leave during Fall 1994, enabling me to get the project off the ground. My department heads during the time I have written the various editions of this book, Dennis Kafura and Jack Carroll, provided unwavering moral support for this project. Mike Keenan, Lenny Heath, and Jeff Shaffer provided valuable input on early versions of the chapters. I also wish to thank Lenny Heath for many years of stimulating discussions about algorithms and analysis (and how to teach both to students). Steve Edwards deserves special thanks for spending so much time helping me on various redesigns of the C++ and Java code versions for the second and third editions, and many hours of discussion on the principles of program design. Thanks to LayneWatson for his help with Mathematica, and to Bo Begole, Philip Isenhour, Jeff Nielsen, and Craig Struble for much technical assistance. Thanks to Bill Mc-Quain, Mark Abrams and Dennis Kafura for answering lots of silly questions about C++ and Java.
I am truly indebted to the many reviewers of the various editions of this manuscript. For the first edition these reviewers included J. David Bezek (University of Evansville), Douglas Campbell (Brigham Young University), Karen Davis (University of Cincinnati), Vijay Kumar Garg (University of Texas-Austin), Jim Miller (University of Kansas), Bruce Maxim (University of Michigan-Dearborn), Jeff Parker (Agile Networks/Harvard), Dana Richards (George Mason University), Jack Tan (University of Houston), and Lixin Tao (Concordia University). Without their help, this book would contain many more technical errors and many fewer insights.
For the second edition, I wish to thank these reviewers: Gurdip Singh (Kansas State University), Peter Allen (Columbia University), Robin Hill (University of Wyoming), Norman Jacobson (University of California–Irvine), Ben Keller (Eastern Michigan University), and Ken Bosworth (Idaho State University). In addition, I wish to thank Neil Stewart and Frank J. Thesen for their comments and ideas for improvement.
Third edition reviewers included Randall Lechlitner (University of Houstin, Clear Lake) and Brian C. Hipp (York Technical College). I thank them for their comments.
Prentice Hall was the original print publisher for the first and second editions. Without the hard work of many people there, none of this would be possible. Authors simply do not create printer-ready books on their own. Foremost thanks go to Kate Hargett, Petra Rector, Laura Steele, and Alan Apt, my editors over the years. My production editors, Irwin Zucker for the second edition, Kathleen Caren for the original C++ version, and Ed DeFelippis for the Java version, kept everything moving smoothly during that horrible rush at the end. Thanks to Bill Zobrist and Bruce Gregory (I think) for getting me into this in the first place. Others at Prentice Hall who helped me along the way include Truly Donovan, Linda Behrens, and Phyllis Bregman. Thanks to Tracy Dunkelberger for her help in returning the copyright to me, thus enabling the electronic future of this work. I am sure I owe thanks to many others at Prentice Hall for their help in ways that I am not even aware of.
I am thankful to Shelley Kronzek at Dover publications for her faith in taking on the print publication of this third edition. Much expanded, with both Java and C++ versions, and many inconsistencies corrected, I am confident that this is the best edition yet. But none of us really knows whether students will prefer a free online textbook or a low-cost, printed bound version. In the end, we believe that the two formats will be mutually supporting by offering more choices. Production editor James Miller and design manager Marie Zaczkiewicz have worked hard to ensure that the production is of the highest quality.
I wish to express my appreciation to Hanan Samet for teaching me about data structures. I learned much of the philosophy presented here from him as well, though he is not responsible for any problems with the result. Thanks to my wife Terry, for her love and support, and to my daughters Irena and Kate for pleasant diversions from working too hard. Finally, and most importantly, to all of the data structures students over the years who have taught me what is important and what should be skipped in a data structures course, and the many new insights they have provided. This book is dedicated to them.
Cliff Shaffer
Blacksburg, Virginia
展开