• Introduction
  • Search
  • 1. Data Science
    • 1.1 Introduction
    • 1.1.1 Computational Tools
    • 1.1.2 Statistical Techniques
    • 1.2 Why Data Science?
    • 1.3 Plotting the Classics
    • 1.3.1 Literary Characters
    • 1.3.2 Another Kind of Character
  • 2. Causality and Experiments
    • 2.1 John Snow and the Broad Street Pump
    • 2.2 Snow’s “Grand Experiment”
    • 2.3 Establishing Causality
    • 2.4 Randomization
    • 2.5 Endnote
  • 3. Programming in Python
    • 3.1 Expressions
    • 3.2 Names
    • 3.2.1 Example: Growth Rates
    • 3.3 Call Expressions
    • 3.4 Introduction to Tables
  • 4. Data Types
    • 4.1 Numbers
    • 4.2 Strings
    • 4.2.1 String Methods
    • 4.3 Comparisons
  • 5. Sequences
    • 5.1 Arrays
    • 5.2 Ranges
    • 5.3 More on Arrays
  • 6. Tables
    • 6.1 Sorting Rows
    • 6.2 Selecting Rows
    • 6.3 Example: Population Trends
    • 6.4 Example: Trends in Gender
  • 7. Visualization
    • 7.1 Categorical Distributions
    • 7.2 Numerical Distributions
    • 7.3 Overlaid Graphs
  • 8. Functions and Tables
    • 8.1 Applying Functions to Columns
    • 8.2 Classifying by One Variable
    • 8.3 Cross-Classifying
    • 8.4 Joining Tables by Columns
    • 8.5 Bike Sharing in the Bay Area
  • 9. Randomness
    • 9.1 Conditional Statements
    • 9.2 Iteration
    • 9.3 Simulation
    • 9.4 The Monty Hall Problem
    • 9.5 Finding Probabilities
  • 10. Sampling and Empirical Distributions
    • 10.1 Empirical Distributions
    • 10.2 Sampling from a Population
    • 10.3 Empirical Distibution of a Statistic
  • 11. Testing Hypotheses
    • 11.1 Assessing Models
    • 11.2 Multiple Categories
    • 11.3 Decisions and Uncertainty
    • 11.4 Error Probabilities
  • 12. Comparing Two Samples
    • 12.1 A/B Testing
    • 12.2 Deflategate
    • 12.3 Causality
  • 13. Estimation
    • 13.1 Percentiles
    • 13.2 The Bootstrap
    • 13.3 Confidence Intervals
    • 13.4 Using Confidence Intervals
  • 14. Why the Mean Matters
    • 14.1 Properties of the Mean
    • 14.2 Variability
    • 14.3 The SD and the Normal Curve
    • 14.4 The Central Limit Theorem
    • 14.5 The Variability of the Sample Mean
    • 14.6 Choosing a Sample Size
  • 15. Prediction
    • 15.1 Correlation
    • 15.2 The Regression Line
    • 15.3 The Method of Least Squares
    • 15.4 Least Squares Regression
    • 15.5 Visual Diagnostics
    • 15.6 Numerical Diagnostics
  • 16. Inference for Regression
    • 16.1 A Regression Model
    • 16.2 Inference for the True Slope
    • 16.3 Prediction Intervals
  • 17. Classification
    • 17.1 Nearest Neighbors
    • 17.2 Training and Testing
    • 17.3 Rows of Tables
    • 17.4 Implementing the Classifier
    • 17.5 The Accuracy of the Classifier
    • 17.6 Multiple Regression
  • 18. Updating Predictions
    • 18.1 A "More Likely Than Not" Binary Classifier
    • 18.2 Making Decisions

Powered by Jupyter Book