< Introducing Pandas Objects | Contents | Operating on Data in Pandas >
In Chapter 2, we looked in detail at methods and tools to access, set, and modify values in NumPy arrays. These included indexing [e.g., arr[2, 1]], slicing [e.g., arr[:, 1:5]], masking [e.g., arr[arr > 0]], fancy indexing [e.g., arr[0, [1, 5]]], and combinations thereof [e.g., arr[:, [1, 5]]]. Here we'll look at similar means of accessing and modifying values in Pandas Series and DataFrame objects. If you have used the NumPy patterns, the corresponding patterns in Pandas will feel very familiar, though there are a few quirks to be aware of.
We'll start with the simple case of the one-dimensional Series object, and then move on to the more complicated two-dimesnional DataFrame object.
Data Selection in Series¶
As we saw in the previous section, a Series object acts in many ways like a one-dimensional NumPy array, and in many ways like a standard Python dictionary. If we keep these two overlapping analogies in mind, it will help us to understand the patterns of data indexing and selection in these arrays.
Series as dictionary¶
Like a dictionary, the Series object provides a mapping from a collection of keys to a collection of values:
We can also use dictionary-like Python expressions and methods to examine the keys/indices and values:
Series objects can even be modified with a dictionary-like syntax. Just as you can extend a dictionary by assigning to a new key, you can extend a Series by assigning to a new index value:
This easy mutability of the objects is a convenient feature: under the hood, Pandas is making decisions about memory layout and data copying that might need to take place; the user generally does not need to worry about these issues.
Series as one-dimensional array¶
A Series builds on this dictionary-like interface and provides array-style item selection via the same basic mechanisms as NumPy arrays that is, slices, masking, and fancy indexing. Examples of these are as follows:
19552860 | 114.806121 |
19651127 | 139.076746 |
Any of these indexing conventions may also be used to set or modify values; this is done in the standard way that you might be accustomed to from working with NumPy:
423967 | 38332521 | 90.000000 |
170312 | 19552860 | 114.806121 |
149995 | 12882135 | 85.883763 |
141297 | 19651127 | 139.076746 |
695662 | 26448193 | 38.018740 |
To build up your fluency in Pandas data manipulation, I suggest spending some time with a simple DataFrame and exploring the types of indexing, slicing, masking, and fancy indexing that are allowed by these various indexing approaches.
Additional indexing conventions¶
There are a couple extra indexing conventions that might seem at odds with the preceding discussion, but nevertheless can be very useful in practice. First, while indexing refers to columns, slicing refers to rows:
170312 | 19552860 | 114.806121 |
149995 | 12882135 | 85.883763 |
Such slices can also refer to rows by number rather than by index:
170312 | 19552860 | 114.806121 |
149995 | 12882135 | 85.883763 |
Similarly, direct masking operations are also interpreted row-wise rather than column-wise:
170312 | 19552860 | 114.806121 |
141297 | 19651127 | 139.076746 |
These two conventions are syntactically similar to those on a NumPy array, and while these may not precisely fit the mold of the Pandas conventions, they are nevertheless quite useful in practice.
< Introducing Pandas Objects | Contents | Operating on Data in Pandas >