A text file containing a list of commands is called which of the following?

The types of files recognized by the system are either regular, directory, or special. However, the operating system uses many variations of these basic types.

The following basic types of files exist:

ItemDescriptionregularStores data (text, binary, and executable)directoryContains information used to access other filesspecialDefines a FIFO (first-in, first-out) pipe file or a physical device

All file types recognized by the system fall into one of these categories. However, the operating system uses many variations of these basic types.

Regular files

Regular files are the most common files and are used to contain data. Regular files are in the form of text files or binary files:

Text files

Text files are regular files that contain information stored in ASCII format text and are readable by the user. You can display and print these files. The lines of a text file must not contain NUL characters, and none can exceed {LINE_MAX} bytes in length, including the newline character.

The term text file does not prevent the inclusion of control or other nonprintable characters (other than NUL). Therefore, standard utilities that list text files as inputs or outputs are either able to process the special characters or they explicitly describe their limitations within their individual sections.

Binary files

Binary files are regular files that contain information readable by the computer. Binary files might be executable files that instruct the system to accomplish a job. Commands and programs are stored in executable, binary files. Special compiling programs translate ASCII text into binary code.

Text and binary files differ only in that text files have lines of less than {LINE_MAX} bytes, with no NUL characters, each terminated by a newline character.

Directory files

Directory files contain information that the system needs to access all types of files, but directory files do not contain the actual file data. As a result, directories occupy less space than a regular file and give the file system structure flexibility and depth. Each directory entry represents either a file or a subdirectory. Each entry contains the name of the file and the file's index node reference number (i-node number). The i-node number points to the unique index node assigned to the file. The i-node number describes the location of the data associated with the file. Directories are created and controlled by a separate set of commands.

Special files

Special files define devices for the system or are temporary files created by processes. The basic types of special files are FIFO (first-in, first-out), block, and character. FIFO files are also called pipes. Pipes are created by one process to temporarily allow communication with another process. These files cease to exist when the first process finishes. Block and character files define devices.

Every file has a set of permissions (called access modes) that determine who can read, modify, or execute the file.

A section of a page that consists of a composition that forms an independent part of a document, page, or site.

An article is not a navigational landmark, but may be nested to form a discussion where assistive technologies could pay attention to article nesting to assist the user in following the discussion. An article could be a forum post, a magazine or newspaper article, a web log entry, a user-submitted comment, or any other independent item of content. It is independent in that its contents could stand alone, for example in syndication. However, the element is still associated with its ancestors; for instance, contact information that applies to a parent body element still covers the article as well. When nesting articles, the child articles represent content that is related to the content of the parent article. For instance, a web log entry on a site that accepts user-submitted comments could represent the comments as articles nested within the article for the web log entry. Author, heading, date, or other information associated with an article does not apply to nested articles.

When the user navigates to an element assigned the role of article, assistive technologies that typically intercept standard keyboard events SHOULD switch to document browsing mode, as opposed to passing keyboard events through to the web application. Assistive technologies MAY provide a feature allowing the user to navigate the hierarchy of any nested article elements.

When an article is in the context of a feed, the author MAY specify values for aria-posinset and aria-setsize.

Extracting text from a file is a common task in scripting and programming, and Python makes it easy. In this guide, we'll discuss some simple ways to extract text from a file using the Python 3 programming language.

  • Make sure you're using Python 3
  • Reading data from a text file
  • Using "with open"
  • Reading text files line-by-line
  • Storing text data in a variable
  • Searching text for a substring
  • Incorporating regular expressions
  • Putting it all together

Make sure you're using Python 3

In this guide, we'll be using Python version 3. Most systems come pre-installed with Python 2.7. While Python 2.7 is used in legacy code, Python 3 is the present and future of the Python language. Unless you have a specific reason to write or support Python 2, we recommend working in Python 3.

For Microsoft Windows, Python 3 can be downloaded from the Python official website. When installing, make sure the "Install launcher for all users" and "Add Python to PATH" options are both checked, as shown in the image below.

A text file containing a list of commands is called which of the following?

On Linux, you can install Python 3 with your package manager. For instance, on Debian or Ubuntu, you can install it with the following command:

sudo apt-get update && sudo apt-get install python3

For macOS, the Python 3 installer can be downloaded from python.org, as linked above. If you are using the Homebrew package manager, it can also be installed by opening a terminal window (Applications → Utilities), and running this command:

brew install python3

Running Python

On Linux and macOS, the command to run the Python 3 interpreter is python3. On Windows, if you installed the launcher, the command is py. The commands on this page use python3; if you're on Windows, substitute py for python3 in all commands.

Running Python with no options starts the interactive interpreter. For more information about using the interpreter, see Python overview: using the Python interpreter. If you accidentally enter the interpreter, you can exit it using the command exit() or quit().

Running Python with a file name will interpret that python program. For instance:

python3 program.py

...runs the program contained in the file program.py.

Okay, how can we use Python to extract text from a text file?

Reading data from a text file

First, let's read a text file. Let's say we're working with a file named lorem.txt, which contains lines from the Lorem Ipsum example text.

Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Nunc fringilla arcu congue metus aliquam mollis.
Mauris nec maximus purus. Maecenas sit amet pretium tellus.
Quisque at dignissim lacus.

Note

In all the examples that follow, we work with the four lines of text contained in this file. Copy and paste the latin text above into a text file, and save it as lorem.txt, so you can run the example code using this file as input.

A Python program can read a text file using the built-in open() function. For example, the Python 3 program below opens lorem.txt for reading in text mode, reads the contents into a string variable named contents, closes the file, and prints the data.

myfile = open("lorem.txt", "rt") # open lorem.txt for reading text
contents = myfile.read()         # read the entire file to string
myfile.close()                   # close the file
print(contents)                  # print string contents

Here, myfile is the name we give to our file object.

The "rt" parameter in the open() function means "we're opening this file to read text data"

The hash mark ("#") means that everything on that line is a comment, and it's ignored by the Python interpreter.

If you save this program in a file called read.py, you can run it with the following command.

python3 read.py

The command above outputs the contents of lorem.txt:

Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Nunc fringilla arcu congue metus aliquam mollis.
Mauris nec maximus purus. Maecenas sit amet pretium tellus.
Quisque at dignissim lacus.

Using "with open"

It's important to close your open files as soon as possible: open the file, perform your operation, and close it. Don't leave it open for extended periods of time.

When you're working with files, it's good practice to use the with open...as compound statement. It's the cleanest way to open a file, operate on it, and close the file, all in one easy-to-read block of code. The file is automatically closed when the code block completes.

Using with open...as, we can rewrite our program to look like this:

with open ('lorem.txt', 'rt') as myfile:  # Open lorem.txt for reading text
    contents = myfile.read()              # Read the entire file to a string
print(contents)                           # Print the string

Note

Indentation is important in Python. Python programs use white space at the beginning of a line to define scope, such as a block of code. We recommend you use four spaces per level of indentation, and that you use spaces rather than tabs. In the following examples, make sure your code is indented exactly as it's presented here.

Example

Save the program as read.py and execute it:

python3 read.py

Output:

Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Nunc fringilla arcu congue metus aliquam mollis.
Mauris nec maximus purus. Maecenas sit amet pretium tellus.
Quisque at dignissim lacus.

Reading text files line-by-line

In the examples so far, we've been reading in the whole file at once. Reading a full file is no big deal with small files, but generally speaking, it's not a great idea. For one thing, if your file is bigger than the amount of available memory, you'll encounter an error.

In almost every case, it's a better idea to read a text file one line at a time.

In Python, the file object is an iterator. An iterator is a type of Python object which behaves in certain ways when operated on repeatedly. For instance, you can use a for loop to operate on a file object repeatedly, and each time the same operation is performed, you'll receive a different, or "next," result.

Example

For text files, the file object iterates one line of text at a time. It considers one line of text a "unit" of data, so we can use a for...in loop statement to iterate one line at a time:

brew install python3
0

Output:

brew install python3
1

Notice that we're getting an extra line break ("newline") after every line. That's because two newlines are being printed. The first one is the newline at the end of every line of our text file. The second newline happens because, by default, print() adds a linebreak of its own at the end of whatever you've asked it to print.

Let's store our lines of text in a variable — specifically, a list variable — so we can look at it more closely.

Storing text data in a variable

In Python, lists are similar to, but not the same as, an array in C or Java. A Python list contains indexed data, of varying lengths and types.

Example

brew install python3
2

The output of this program is a little different. Instead of printing the contents of the list, this program prints our list object, which looks like this:

Output:

brew install python3
3

Here, we see the raw contents of the list. In its raw object form, a list is represented as a comma-delimited list. Here, each element is represented as a string, and each newline is represented as its escape character sequence, \n.

Much like a C or Java array, the list elements are accessed by specifying an index number after the variable name, in brackets. Index numbers start at zero — other words, the nth element of a list has the numeric index n-1.

Note

If you're wondering why the index numbers start at zero instead of one, you're not alone. Computer scientists have debated the usefulness of zero-based numbering systems in the past. In 1982, Edsger Dijkstra gave his opinion on the subject, explaining why zero-based numbering is the best way to index data in computer science. You can read the memo yourself — he makes a compelling argument.

Example

We can print the first element of lines by specifying index number 0, contained in brackets after the name of the list:

brew install python3
4

Output:

brew install python3
5

Example

Or the third line, by specifying index number 2:

brew install python3
6

Output:

brew install python3
7

But if we try to access an index for which there is no value, we get an error:

Example

brew install python3
8

Output:

brew install python3
9

Example

A list object is an iterator, so to print every element of the list, we can iterate over it with for...in:

python3 program.py
0

Output:

brew install python3
1

But we're still getting extra newlines. Each line of our text file ends in a newline character ('\n'), which is being printed. Also, after printing each line, print() adds a newline of its own, unless you tell it to do otherwise.

We can change this default behavior by specifying an end parameter in our print() call:

python3 program.py
2

By setting end to an empty string (two single quotes, with no space), we tell print() to print nothing at the end of a line, instead of a newline character.

Example

Our revised program looks like this:

python3 program.py
3

Output:

Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Nunc fringilla arcu congue metus aliquam mollis.
Mauris nec maximus purus. Maecenas sit amet pretium tellus.
Quisque at dignissim lacus.

The newlines you see here are actually in the file; they're a special character ('\n') at the end of each line. We want to get rid of these, so we don't have to worry about them while we process the file.

How to strip newlines

To remove the newlines completely, we can strip them. To strip a string is to remove one or more characters, usually whitespace, from either the beginning or end of the string.

Tip

This process is sometimes also called "trimming."

Python 3 string objects have a method called rstrip(), which strips characters from the right side of a string. The English language reads left-to-right, so stripping from the right side removes characters from the end.

If the variable is named mystring, we can strip its right side with mystring.rstrip(chars), where chars is a string of characters to strip. For example, "123abc".rstrip("bc") returns 123a.

Tip

When you represent a string in your program with its literal contents, it's called a string literal. In Python (as in most programming languages), string literals are always quoted — enclosed on either side by single (') or double (") quotes. In Python, single and double quotes are equivalent; you can use one or the other, as long as they match on both ends of the string. It's traditional to represent a human-readable string (such as Hello) in double-quotes ("Hello"). If you're representing a single character (such as b), or a single special character such as the newline character (\n), it's traditional to use single quotes ('b', '\n'). For more information about how to use strings in Python, you can read the documentation of strings in Python.

The statement string.rstrip('\n') will strip a newline character from the right side of string. The following version of our program strips the newlines when each line is read from the text file:

python3 program.py
5

The text is now stored in a list variable, so individual lines can be accessed by index number. Newlines were stripped, so we don't have to worry about them. We can always put them back later if we reconstruct the file and write it to disk.

Now, let's search the lines in the list for a specific substring.

Searching text for a substring

Let's say we want to locate every occurrence of a certain phrase, or even a single letter. For instance, maybe we need to know where every "e" is. We can accomplish this using the string's find() method.

The list stores each line of our text as a string object. All string objects have a method, find(), which locates the first occurrence of a substrings in the string.

Let's use the find() method to search for the letter "e" in the first line of our text file, which is stored in the list mylines. The first element of mylines is a string object containing the first line of the text file. This string object has a find() method.

In the parentheses of find(), we specify parameters. The first and only required parameter is the string to search for, "e". The statement mylines[0].find("e") tells the interpreter to search forward, starting at the beginning of the string, one character at a time, until it finds the letter "e." When it finds one, it stops searching, and returns the index number where that "e" is located. If it reaches the end of the string, it returns -1 to indicate nothing was found.

Example

python3 program.py
6

Output:

3

The return value "3" tells us that the letter "e" is the fourth character, the "e" in "Lorem". (Remember, the index is zero-based: index 0 is the first character, 1 is the second, etc.)

The find() method takes two optional, additional parameters: a start index and a stop index, indicating where in the string the search should begin and end. For instance, string.find("abc", 10, 20) searches for the substring "abc", but only from the 11th to the 21st character. If stop is not specified, find() starts at index start, and stops at the end of the string.

Example

For instance, the following statement searchs for "e" in mylines[0], beginning at the fifth character.

python3 program.py
7

Output:

python3 program.py
8

In other words, starting at the 5th character in line[0], the first "e" is located at index 24 (the "e" in "nec").

Example

To start searching at index 10, and stop at index 30:

python3 program.py
9

Output:

Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Nunc fringilla arcu congue metus aliquam mollis.
Mauris nec maximus purus. Maecenas sit amet pretium tellus.
Quisque at dignissim lacus.
0

(The first "e" in "Maecenas").

If find() doesn't locate the substring in the search range, it returns the number -1, indicating failure:

Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Nunc fringilla arcu congue metus aliquam mollis.
Mauris nec maximus purus. Maecenas sit amet pretium tellus.
Quisque at dignissim lacus.
1

Output:

-1

There were no "e" occurrences between indices 25 and 30.

Finding all occurrences of a substring

But what if we want to locate every occurrence of a substring, not just the first one we encounter? We can iterate over the string, starting from the index of the previous match.

In this example, we'll use a while loop to repeatedly find the letter "e". When an occurrence is found, we call find again, starting from a new location in the string. Specifically, the location of the last occurrence, plus the length of the string (so we can move forward past the last one). When find returns -1, or the start index exceeds the length of the string, we stop.

Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Nunc fringilla arcu congue metus aliquam mollis.
Mauris nec maximus purus. Maecenas sit amet pretium tellus.
Quisque at dignissim lacus.
2

Output:

Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Nunc fringilla arcu congue metus aliquam mollis.
Mauris nec maximus purus. Maecenas sit amet pretium tellus.
Quisque at dignissim lacus.
3

Incorporating regular expressions

For complex searches, use regular expressions.

The Python regular expressions module is called re. To use it in your program, import the module before you use it:

Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Nunc fringilla arcu congue metus aliquam mollis.
Mauris nec maximus purus. Maecenas sit amet pretium tellus.
Quisque at dignissim lacus.
4

The re module implements regular expressions by compiling a search pattern into a pattern object. Methods of this object can then be used to perform match operations.

For example, let's say you want to search for any word in your document which starts with the letter d and ends in the letter r. We can accomplish this using the regular expression "\bd\w*r\b". What does this mean?

character sequencemeaning\bA word boundary matches an empty string (anything, including nothing at all), but only if it appears before or after a non-word character. "Word characters" are the digits 0 through 9, the lowercase and uppercase letters, or an underscore ("_").dLowercase letter d.\w*\w represents any word character, and * is a quantifier meaning "zero or more of the previous character." So \w* will match zero or more word characters.rLowercase letter r.\bWord boundary.

So this regular expression will match any string that can be described as "a word boundary, then a lowercase 'd', then zero or more word characters, then a lowercase 'r', then a word boundary." Strings described this way include the words destroyer, dour, and doctor, and the abbreviation dr.

To use this regular expression in Python search operations, we first compile it into a pattern object. For instance, the following Python statement creates a pattern object named pattern which we can use to perform searches using that regular expression.

Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Nunc fringilla arcu congue metus aliquam mollis.
Mauris nec maximus purus. Maecenas sit amet pretium tellus.
Quisque at dignissim lacus.
5

Note

The letter r before our string in the statement above is important. It tells Python to interpret our string as a raw string, exactly as we've typed it. If we didn't prefix the string with an r, Python would interpret the escape sequences such as \b in other ways. Whenever you need Python to interpret your strings literally, specify it as a raw string by prefixing it with r.

Now we can use the pattern object's methods, such as search(), to search a string for the compiled regular expression, looking for a match. If it finds one, it returns a special result called a match object. Otherwise, it returns None, a built-in Python constant that is used like the boolean value "false".

Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Nunc fringilla arcu congue metus aliquam mollis.
Mauris nec maximus purus. Maecenas sit amet pretium tellus.
Quisque at dignissim lacus.
6

Output:

Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Nunc fringilla arcu congue metus aliquam mollis.
Mauris nec maximus purus. Maecenas sit amet pretium tellus.
Quisque at dignissim lacus.
7

To perform a case-insensitive search, you can specify the special constant re.IGNORECASE in the compile step:

Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Nunc fringilla arcu congue metus aliquam mollis.
Mauris nec maximus purus. Maecenas sit amet pretium tellus.
Quisque at dignissim lacus.
8

Output:

Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Nunc fringilla arcu congue metus aliquam mollis.
Mauris nec maximus purus. Maecenas sit amet pretium tellus.
Quisque at dignissim lacus.
7

Putting it all together

So now we know how to open a file, read the lines into a list, and locate a substring in any given list element. Let's use this knowledge to build some example programs.

The program below reads a log file line by line. If the line contains the word "error," it is added to a list called errors. If not, it is ignored. The lower() string method converts all strings to lowercase for comparison purposes, making the search case-insensitive without altering the original strings.

Note that the find() method is called directly on the result of the lower() method; this is called method chaining. Also, note that in the print() statement, we construct an output string by joining several strings with the + operator.

myfile = open("lorem.txt", "rt") # open lorem.txt for reading text
contents = myfile.read()         # read the entire file to string
myfile.close()                   # close the file
print(contents)                  # print string contents
0

Input (stored in logfile.txt):

myfile = open("lorem.txt", "rt") # open lorem.txt for reading text
contents = myfile.read()         # read the entire file to string
myfile.close()                   # close the file
print(contents)                  # print string contents
1

Output:

myfile = open("lorem.txt", "rt") # open lorem.txt for reading text
contents = myfile.read()         # read the entire file to string
myfile.close()                   # close the file
print(contents)                  # print string contents
2

Extract all lines containing substring, using regex

The program below is similar to the above program, but using the re regular expressions module. The errors and line numbers are stored as tuples, e.g., (linenum, line). The tuple is created by the additional enclosing parentheses in the errors.append() statement. The elements of the tuple are referenced similar to a list, with a zero-based index in brackets. As constructed here, err[0] is a linenum and err[1] is the associated line containing an error.

myfile = open("lorem.txt", "rt") # open lorem.txt for reading text
contents = myfile.read()         # read the entire file to string
myfile.close()                   # close the file
print(contents)                  # print string contents
3

Output:

myfile = open("lorem.txt", "rt") # open lorem.txt for reading text
contents = myfile.read()         # read the entire file to string
myfile.close()                   # close the file
print(contents)                  # print string contents
4

Extract all lines containing a phone number

The program below prints any line of a text file, info.txt, which contains a US or international phone number. It accomplishes this with the regular expression "(\+\d{1,2})?[\s.-]?\d{3}[\s.-]?\d{4}". This regex matches the following phone number notations:

  • 123-456-7890
  • (123) 456-7890
  • 123 456 7890
  • 123.456.7890
  • +91 (123) 456-7890
myfile = open("lorem.txt", "rt") # open lorem.txt for reading text
contents = myfile.read()         # read the entire file to string
myfile.close()                   # close the file
print(contents)                  # print string contents
5

Output:

myfile = open("lorem.txt", "rt") # open lorem.txt for reading text
contents = myfile.read()         # read the entire file to string
myfile.close()                   # close the file
print(contents)                  # print string contents
6

Search a dictionary for words

The program below searches the dictionary for any words that start with h and end in pe. For input, it uses a dictionary file included on many Unix systems, /usr/share/dict/words.

Which of the following file contains a list of commands?

A batch file is a script file in DOS, OS/2 and Microsoft Windows. It consists of a series of commands to be executed by the command-line interpreter, stored in a plain text file.

Which of the following commands is used to create a text file in the terminal?

The cat command is a very popular and versatile command in the 'nix ecosystem. There are 4 common usages of the cat command. It can display a file, concatenate (combine) multiple files, echo text, and it can be used to create a new file.

Which command is used to see the contents of the file?

Commands for displaying file contents (pg, more, page, and cat commands) The pg, more, and page commands allow you to view the contents of a file and control the speed at which your files are displayed. You can also use the cat command to display the contents of one or more files on your screen.

Which of the following commands is used to display the contents of a file called example txt?

In the above example, we use the cat command to display the contents of list. txt.