Errata

ISL with Python

A note about the labs

A number of readers have reported issues when running the labs. Some of these are due to errors, which are listed below. Others are due to changes made to Python packages since the labs were developed.

Here we show how to create and operate a Conda environment, which we recommend using to run the labs. The Resources page contains updated versions of all of the labs. We will update these as needed.

Please try out the Conda environment, and the updated versions of the labs available on the Resources page, before reporting a possible erratum.

Since the 1st printing (Summer 2023)

On page 44, “Out[22]:” should not be numbered. The authors.

On page 49, the input block after “In[43]:” should be numbered (this will affect the numbering of downstream input blocks as well). The authors.

On the bottom of page 50 of the Chapter 2 lab, the sentence “To fine-tune the output of the ax.contour() function, take a look at the help file by typing ?plt.contour” should instead say “To fine-tune the output of the ax.contour() function, take a look at the help file by typing ?ax.contour” Thanks to Hargen Zheng.

On page 61, block 103, there should be a semi-colon in the last line to indicate that the output should be suppressed. Also, the semi-colon in the first line is superfluous, and should be removed. Thanks to Julien Gomes.

On page 66, there is an error in the code in Exercise 2(f): the line
college['Elite'] = pd.cut(college['Top10perc'], [0,0.5,1], labels=['No', 'Yes'])
should be replaced with
college[“Elite”] = pd.cut(college[“Top10perc”]/100, [0, 0.5, 1], labels = [“No”, “Yes”]).
Thanks to Dylan Owens.

In the footnote on the bottom of page 76, the sentence "Details of how to compute the 95% confidence interval precisely in R will be provided later in this chapter" should mention Python instead of R. Thanks to Rush Kirubi.

On the bottom of page 81, the sentence “Any statistical software package can be used to compute these coefficient estimates, and later in this chapter we will show how this can be done in R.” should mention Python instead of R. Thanks to Jasmin Bogatinovski and Omar Mallick.

On pages 87, 236, 601, “Mallow’s Cp” should be written as “Mallows’ Cp”. Thanks to James MacKinnon.

On the top of page 94: The sentence “It is estimated that those in the South will have $18.69 less debt than those in the East, and that those in the West will have $12.50 less debt than those in the East” should instead say “It is estimated that those in the West will have $18.69 less debt than those in the East, and that those in the South will have $12.50 less debt than those in the East. Thanks to Yongjun Zhu and Felipe Provezano Coutinho.

On page 131, exercise 11d: "Show algebraically, and confirm numerically in R" should read "Show algebraically, and confirm numerically in Python". Thanks to Julien Gomes.

On the bottom of page 184, the last sentence is missing two words. It should read: “In this case Purchase has only Yes and No values and the method returns how many values of each there are.” Thanks to Johannes Ruf.

On page 187, the printed text under “In[60]:” should not be in green. The authors.

On page 188, there are a series of typos, all due to an error in code block 61. In code block 61, the line
logit_labels = np.where(logit_pred[:,1] > 5, 'Yes', 'No')
should instead say
logit_labels = np.where(logit_pred[:,1] > 0.5, 'Yes', 'No')
With this typo corrected, a correction is also needed in code block 62: the first column of the contingency table should contain “931, 2” instead of “933, 0”.
Finally, in the text that follows, the sentence “If we use 0.5 as the predicted probability cut-off for the classifier, then we have a problem: none of the test observations are predicted to purchase insurance.” should be corrected as follows: “If we use 0.5 as the predicted probability cut-off for the classifier, then we have a problem: only two of the test observations are predicted to purchase insurance.”
Thanks to Lauren Chen.

On page 196, exercise 12d, the last two estimates should have the subscript “apple” instead of “orange”. Thanks to Sundong Kim.

On page 225, there’s an error in the code for performing the bootstrap. The line
store[i] = np.sum(rng.choice(100, replace=True) == 4) > 0
should be replaced with
store[i] = np.sum(rng.choice(100, size=100, replace=True) == 4) > 0
Thanks to Alistair Bertrand Sands Keiller.

On page 227, Exercise 8f): data.frame() should be replaced by pd.DataFrame(). Thanks to Adrian Hayler.

On page 231, Algorithm 6.1, Step 3: delete the extra word “using”. Thanks to Mario Pepe.

On page 355, the output of cell [6] should be 0.79 instead of 0.7275. Thanks to Karlo Delic.

On page 358, there is an error in the confusion table. Instead of [108 61, 10 21] it should say [94 32, 24 50]. Thanks to Lauren Chen.

On page 438, we define standard_lasso in cell [14] and never use it. We have changed the lab slightly, and now cell [15] and [16] are slightly modified. Pick up the modified lab from the GitHub site linked here. Thanks to Martin Storath.

On page 486, the x-axis of Figure 11.7 is missing a vertical line in the denominator (i.e. a single vertical line should be replaced with a double vertical line in the norm symbol).

On the bottom of page 511: “we can use (12.11) to see that the PVE defined in (12.10) equals . . . ” should be replaced with “we can use (12.11) to see that the PVE defined in (12.10), summed over the first $M$ principal components, equals . . .”. Thanks to Zhuyun Yin.

On page 561, the sentence “Typically, the R function that is used to compute a test statistic will make…” should mention Python, not R. Thanks to Yongjun Zhu.