Errata
ISL with Python
A note about the labs
A number of readers have reported issues when running the labs. Some of these are due to errors, which are listed below. Others are due to changes made to Python packages since the labs were developed.
Here we show how to create and operate a Conda environment, which we recommend using to run the labs. The Resources page contains updated versions of all of the labs. We will continue to update these as needed.
Please try out the Conda environment, and the updated versions of the labs available on the Resources page, before reporting a possible erratum.
Since the 1st printing (Summer 2023)
On page 44, “Out[22]:” should not be numbered. The authors.
On page 49, the input block after “In[43]:” should be numbered (this will affect the numbering of downstream input blocks as well). The authors.
On the bottom of page 50 of the Chapter 2 lab, the sentence “To fine-tune the output of the ax.contour() function, take a look at the help file by typing ?plt.contour” should instead say “To fine-tune the output of the ax.contour() function, take a look at the help file by typing ?ax.contour” Thanks to Hargen Zheng.
On page 54, last line above the third code cell: "TRUE" should be "True". Thanks to Pedro Zühlke.
On page 59, in the last line before the second code cell, there is a repeated “of” in “attribute of of the dataframe”. Thanks to Pedro Zühlke.
On page 61, block 103, there should be a semi-colon in the last line to indicate that the output should be suppressed. Also, the semi-colon in the first line is superfluous, and should be removed. Thanks to Julien Gomes.
On page 66, there is an error in the code in Exercise 2(f): the line
college['Elite'] = pd.cut(college['Top10perc'], [0,0.5,1], labels=['No', 'Yes'])
should be replaced with
college[“Elite”] = pd.cut(college[“Top10perc”]/100, [0, 0.5, 1], labels = [“No”, “Yes”]).
Thanks to Dylan Owens.
In the footnote on the bottom of page 76, the sentence "Details of how to compute the 95% confidence interval precisely in R will be provided later in this chapter" should mention Python instead of R. Thanks to Rush Kirubi.
On the bottom of page 81, the sentence “Any statistical software package can be used to compute these coefficient estimates, and later in this chapter we will show how this can be done in R.” should mention Python instead of R. Thanks to Jasmin Bogatinovski and Omar Mallick.
On pages 87, 236, 601, “Mallow’s Cp” should be written as “Mallows’ Cp”. Thanks to James MacKinnon.
On the top of page 94: The sentence “It is estimated that those in the South will have $18.69 less debt than those in the East, and that those in the West will have $12.50 less debt than those in the East” should instead say “It is estimated that those in the West will have $18.69 less debt than those in the East, and that those in the South will have $12.50 less debt than those in the East. Thanks to Yongjun Zhu and Felipe Provezano Coutinho.
On page 117, "python" should be "Python", and “rmvar” should be “rm”. Thanks to Pedro Zühlke.
On page 120, “Prediction intervals are computing” should say “Prediction intervals are computed.” Thanks to Pedro Zühlke.
On page 121, third line after the first code cell: "exisiting" should be "existing". Thanks to Pedro Zühlke.
On page 126, it should say "why there are NaNs in the first row above" as opposed to "why their are NaNs in the first row above". Thanks to Guilherme Roma.
On page 126, penultimate line before the first code cell: "why their are" should be "why there are". Thanks to Pedro Zühlke.
On page 131, exercise 11d: "Show algebraically, and confirm numerically in R" should read "Show algebraically, and confirm numerically in Python". Thanks to Julien Gomes.
On page 141, second paragraph, 6th line: "using statistical software such as R” should say “using statistical software”. Thanks to Pedro Zühlke.
On page 158, fourth paragraph, 2nd and 3rd lines: Double "instead" in "Instead of assuming..., we instead make ...". Thanks to Pedro Zühlke.
On the bottom of page 184, the last sentence is missing two words. It should read: “In this case Purchase has only Yes and No values and the method returns how many values of each there are.” Thanks to Johannes Ruf.
On page 187, the printed text under “In[60]:” should not be in green. The authors.
On page 188, there are a series of typos, all due to an error in code block 61. In code block 61, the line
logit_labels = np.where(logit_pred[:,1] > 5, 'Yes', 'No')
should instead say
logit_labels = np.where(logit_pred[:,1] > 0.5, 'Yes', 'No')
With this typo corrected, a correction is also needed in code block 62: the first column of the contingency table should contain “931, 2” instead of “933, 0”.
Finally, in the text that follows, the sentence “If we use 0.5 as the predicted probability cut-off for the classifier, then we have a problem: none of the test observations are predicted to purchase insurance.” should be corrected as follows: “If we use 0.5 as the predicted probability cut-off for the classifier, then we have a problem: only two of the test observations are predicted to purchase insurance.”
Thanks to Lauren Chen.
On page 196, exercise 12d, the last two estimates should have the subscript “apple” instead of “orange”. Thanks to Sundong Kim.
On page 212, line 7: “R” should be replaced with “Python”. Thanks to Salena Torres Ashton.
On page 214, Figure 5.10: it would be better for the histogram axis to be labeled $\hat\alpha$ rather than $\alpha$. Thanks to Salena Torres Ashton.
On page 216, line preceding the last code cell: "training and test set" should be "training and test sets". Thanks to Pedro Zühlke.
On page 218, 4th line below the first output (Out[9]): for consistency with the remainder of the chapter, the 'K' in "K results in K-fold ..." should be in lowercase. A similar comment applies on page 219 for the three occurrences of K in the paragraph above the second cell, and the single occurrence in each of the two paragraphs below that same code cell; moreover, this last occurrence should be italicized. Thanks to Pedro Zühlke.
On page 219, penultimate line above the last code cell: "funtion to implement" should be "function to implement". Thanks to Pedro Zühlke.
On page 225, there’s an error in the code for performing the bootstrap. The line
store[i] = np.sum(rng.choice(100, replace=True) == 4) > 0
should be replaced with
store[i] = np.sum(rng.choice(100, size=100, replace=True) == 4) > 0
Thanks to Alistair Bertrand Sands Keiller.
On page 219, near the bottom of the page, the word “function” is misspelled as “funtion”. Thanks to Titus Teodorescu.
On page 227, Exercise 8f): data.frame() should be replaced by pd.DataFrame(). Thanks to Adrian Hayler.
On page 231, Algorithm 6.1, Step 3: delete the extra word “using”. Thanks to Mario Pepe.
On page 235, 2nd paragraph after Algorithm 6.3, 1st line: "requires that the number ... is larger" should be "requires that the number ... be larger". Thanks to Pedro Zühlke.
On page 316, the output of command "In[18]" should have "bs(age)" instead of "bs(age, knots)". Thanks to Marcin Łukasik.
On page 334, line preceding (8.3): "minimize the equation" should be "minimize the expression". Thanks to Pedro Zühlke.
Figure 8.3, bottom left: To be consistent with the text, the labels at the nodes should have the form "X < t" instead of "X <= t". Thanks to Pedro Zühlke.
On page 355, the output of cell [6] should be 0.79 instead of 0.7275. Thanks to Karlo Delic.
On page 358, there is an error in the confusion table. Instead of [108 61, 10 21] it should say [94 32, 24 50]. Thanks to Lauren Chen.
On page 363, Exercise 3 should mention Python, not R. Thanks to Marcin Łukasik.
On page 387, in the first paragraph of Section 9.6.1, “When the cost argument is small” should say “When the C argument is small”. Thanks to Ameer Dharamshi.
On page 438, we define standard_lasso in cell [14] and never use it. We have changed the lab slightly, and now cell [15] and [16] are slightly modified. Pick up the modified lab from the GitHub site linked here. Thanks to Martin Storath.
On page 486, the x-axis of Figure 11.7 is missing a vertical line in the denominator (i.e. a single vertical line should be replaced with a double vertical line in the norm symbol).
On the bottom of page 511: “we can use (12.11) to see that the PVE defined in (12.10) equals . . . ” should be replaced with “we can use (12.11) to see that the PVE defined in (12.10), summed over the first $M$ principal components, equals . . .”. Thanks to Zhuyun Yin.
On page 561, the sentence “Typically, the R function that is used to compute a test statistic will make…” should mention Python, not R. Thanks to Yongjun Zhu.