One strategy for model selection is to choose a model that minimizes AIC among all models considered. There are various theoretical justifications, but they do not correctly account for how AIC is actually used. It has been demonstrated several times and long ago in the literature that AIC does not correctly adjust for model selection, which leads to overfitting and overly optimistic model assessment. Yet AIC is promoted in textbooks and in the literature as suitable for model selection. Data splitting, cross-validation or resampling based procedures are better alternatives, but they come with higher computational costs and/or less efficiency. I will show how AIC overfits and how this overfitting is alleviated by e.g. cross-validation. I will also show some new theoretical results that can be used to adjust AIC to account correctly for the model selection, which will likewise alleviate the overfitting.