When you want to read data from a text file, you create a file object using the
Here's the official Python documentation on reading and writing from files. But before reading that, let's dive into the bare minimum that I want you to know. Show Let's just go straight to a code example. Pretend you have a file named example.txt in the current directory. If you don't, just create one, and then fill it with these lines and save it:
Here's a short snippet of Python code to open that file and print out its contents to screen – note that this Python code has to be run in the same directory that the 1 file exists in.
Did that seem too complicated? Here's a less verbose version:
Here's how to read that file, line-by-line, using a for-loop:
(Note: If you're getting a FileNotFoundError already – that's almost to be expected. Keep reading!) Still seem too complicated? Well, there's no getting around the fact that at the programmatic layer, opening a file is distinct from reading its contents. Not only that, we also have to manually close the file. Now let's take this step-by-step. To open a file, we simply use the 2 method and pass in, as the first argument, the filename:
That seems easy enough, so let's jump into some common errors. How to mess up when opening a fileHere is likely the most common error you'll get when trying to open a file.
In fact, I've seen students waste dozens of hours trying to get past this error message, because they don't stop to read it. So, read it: What does 3 mean?Try putting spaces where the capitalization occurs:
You'll get this error because you tried to open a file that simply doesn't exist. Sometimes, it's a simple typo, trying to 2 a file named 5 but accidentally misspelling it as 6.But more often, it's because you know a file exists under a given filename, such as 5 – but how does your Python code know where that file is? Is it the 5 that exists in your Downloads folder? Or the one that might exist in your Documents folder? Or the thousands of other folders on your computer system?That's a pretty complicated question. But the first step in not wasting your time is that if you ever see this error, stop whatever else you are doing. Don't tweak your convoluted for-loop. Don't try to install a new Python library. Don't restart your computer, then re-run the script to see if the error magically fixes itself. The error 3 occurs because you either don't know where a file actually is on your computer. Or, even if you do, you don't know how to tell your Python program where it is. Don't try to fix other parts of your code that aren't related to specifying filenames or paths.How to fix a FileNotFoundErrorHere's a surefire fix: make sure the file actually exists. Let's start from scratch by making an error. In your system shell (i.e. Terminal), change to your Desktop folder:
Now, run ipython:
And now that you're in the interactive Python interpreter, try to open a filename that you know does not exist on your Desktop, and then enjoy the error message:
0Now manually create the file on your Desktop, using Sublime Text 3 or whatever you want. Add some text to it, then save it. 1Look and see for yourself that this file actually exists in your Desktop folder: OK, now switch back to your interactive Python shell (i.e. ipython), the one that you opened after changing into the Desktop folder (i.e. 0). Re-run that 2 command, the one that resulted in the FileNotFoundError:
Hopefully, you shouldn't get an error. But what is that object that the 2 variable points to? Use the 3 method to figure it out: 3And what is that? The details aren't important, other than to point out that 2 is most definitely not just a string literal, i.e. 5.Use the Tab autocomplete (i.e. type in 6) to get a list of existing methods and attributes for the 2 object: 4Well, we can do a lot more with files than just 8 from them. But let's focus on just reading for now.Assuming the 2 variable points to some kind of file object, this is how you read from it: 5What's in that 0 variable? Again, use the 3 function: 6It's just a string. Which means of course that we can print it out: 7Or count the number of characters: 8Or print it out in all-caps: 9And that's all there's to reading from a file that has been opened. Now onto the mistakes. How to mess up when reading from a fileHere's a very, very common error: 0The error output: 1Take careful note that this is not a FileNotFoundError. It is an AttributeError – which, admittedly, is not very clear – but read the next part: 2The error message gets to the point: the 5 object – i.e. a string literal, e.g. something like 3 does not have a 4 attribute.Revisiting the erroneous code: 0If 5 points to "example.txt", then 5 is simply a 5 object.In other words, a file name is not a file object. Here's a clearer example of errneous code: 4And to beat the point about the head: 5Why is this such a common mistake? Because in 99% of our typical interactions with files, we see a filename on our Desktop graphical interface and we double-click that filename to open it. The graphical interface obfuscates the process – and for good reason. Who cares what's happening as long as my file opens when I double-click it! Unfortunately, we have to care when trying to read a file programmatically. Opening a file is a discrete operation from reading it.
Again, here's the code, in a slightly more verbose fashion: 6The file object also has a 1 method, which formally cleans up after the opened file and allows other programs to safely access it. Again, that's a low-level detail that you never think of in day-to-day computing. In fact, it's something you probably will forget in the programming context, as not closing the file won't automatically break anything (not until we start doing much more complicated types of file operations, at least…). Typically, as soon as a script finishes, any unclosed files will automatically be closed.However, I like closing the file explicitly – not just to be on the safe side – but it helps to reinforce the concept of that file object. One of the advantages of getting down into the lower-level details of opening and reading from files is that we now have the ability to read files line-by-line, rather than one giant chunk. Again, to read files as one giant chunk of content, use the 8 method: 7It doesn't seem like such a big deal now, but that's because 1 probably contains just a few lines. But when we deal with files that are massive – like all 3.3 million records of everyone who has donated more than $200 to a single U.S. presidential campaign committee in 2012 or everyone who has ever visited the White House – opening and reading the file all at once is noticeably slower. And it may even crash your computer.If you've wondered why spreadsheet software, such as Excel, has a limit of rows (roughly 1,000,000), it's because most users do want to operate on a data file, all at once. However, many interesting data files are just too big for that. We'll run into those scenarios later in the quarter. For now, here's what reading line-by-line typically looks like:
Because each line in a textfile has a newline character (which is represented as 4 but is typically "invisible"), invoking the print() function will create double-spaced output, because print() adds a newline to what it outputs (i.e. think back to your original 5 program).To get rid of that effect, call the 6 method, which belongs to 5 objects and removes whitespace characters from the left and right side of a text string: 9And of course, you can make things loud with the good ol' 8 function: 9That's it for now. We haven't covered how to write to a file (which is a far more dangerous operation) – I save that for a separate lesson. But it's enough to know that when dealing with files as a programmer, we have to be much more explicit and specific in the steps. |