NumPy — Python Library

Ramahanishagunda
6 min readJan 8, 2021

--

Introduction to NumPy

What is NumPy?

Numpy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. It is the fundamental package for scientific computing.

Why NumPy arrays?

  • NumPy arrays facilitate advanced mathematical and other types of operations on large numbers of data. Typically, such operations are executed more efficiently and with less code than is possible using Python’s built-in sequences.
  • In order to efficiently use much of today’s scientific/mathematical Python-based software, just knowing how to use Python’s built-in sequence types is insufficient — one also needs to know how to use Numpy arrays.

Limitations of NumPy

  • Numpy arrays have a fixed size at creation, unlike python lists which can grow dynamically. Changing the size of an ndarray will create a new array and delete the original.
  • The elements in a NumPy array are all required to be of the same data type, and thus will be the same size in memory. The exception: one can have arrays of (Python, including NumPy) objects, thereby allowing for arrays of different sized elements.

NumPy ndarray Objects

Array indexing and slicing

Three types of indexing methods are available :

  1. field access:

Record arrays are structured arrays wrapped using a subclass of ndarray, numpy. recarray , which allows field access by attribute on the array object, and record arrays also use a special datatype, numpy. record , which allows field access by attribute on the individual elements of the array.

2. basic slicing :

Basic slicing is an extension of Python’s basic concept of slicing to n dimensions. A Python slice object is constructed by giving start, stop, and step parameters to the built-in slice function. This slice object is passed to the array to extract a part of array.

3. advanced indexing:

It is possible to make a selection from ndarray that is a non-tuple sequence, ndarray object of integer or Boolean data type, or a tuple with at least one item being a sequence object. Advanced indexing always returns a copy of the data. There are two types of advanced indexing − Integer and Boolean.

Memory layout of ndarray

An ndarray is a multidimensional container of items of the same type and size. The number of dimensions and items in an array is defined by its shape, which is a tuple of N non-negative integers that specify the sizes of each dimension. The type of items in the array is specified by a separate data-type object, one of which is associated with each ndarray.

Views and copies

The main difference between a copy and a view of an array is that the copy is a new array, and the view is just a view of the original array.

The copy owns the data and any changes made to the copy will not affect original array, and any changes made to the original array will not affect the copy.

The view does not own the data and any changes made to the view will affect the original array, and any changes made to the original array will affect the view.

Creating arrays

NumPy is used to work with arrays. The array object in NumPy is called ndarray. We can create a NumPy ndarray object by using the array() function.

Linear Algebra with NumPy

Linear algebra is one of the most important topics in data science domain.

There are different types of objects (or structures) in linear algebra:

  • Scalar: Single number
  • Vector: Array of numbers
  • Matrix: 2-dimensional array of numbers
  • Tensor: N-dimensional array of numbers where n > 2

Using NumPy Arrays

Vectorized operations

Vectorization describes the absence of any explicit looping, indexing, etc. These things happen in the code just behind the scenes.

  • vectorized code is more concise and easier to read
  • fewer lines of code generally means fewer bugs
  • the code more closely resembles standard mathematical notation (making it easier, typically, to correctly code mathematical constructs)
  • Without vectorization, our code would be littered with inefficient and difficult to read for loops.

Universal functions

A universal function is a function that operates on ndarray in an element-by-element fashion, supporting array broadcasting, type casting, and several other standard features. That is, a ufunc is a vectorized wrapper for a function that takes a fixed number of specific inputs and produces a fixed number of specific outputs.

Broadcasting and shape manipulation

Broadcasting is the term used to describe the implicit element-by-element behavior of operations; generally speaking, in NumPy all operations, not just arithmetic operations, but logical, bit-wise, functional, etc., behave in this implicit element-by-element fashion, i.e., they broadcast.

General Broadcasting Rules:

When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions and works its way forward. Two dimensions are compatible when

  1. they are equal, or
  2. one of them is 1

If these conditions are not met, Value error will be thrown, indicating that the arrays have incompatible shapes.

Boolean mask

Masking in python and data science is when you want manipulated data in a collection based on some criteria. The criteria you use is typically of a true or false nature, hence the boolean part. Boolean masking is typically the most efficient way to quantify a sub-collection in a collection.

Dates and time in NumPy

The most basic way to create datetimes is from strings date or datetime format. The unit for internal storage is automatically selected from the form of the string, and can be either a date unit or a time unit. The date units are years (‘Y’), months (‘M’), weeks (‘W’), and days (‘D’), while the time units are hours (‘h’), minutes (‘m’), seconds (‘s’), milliseconds (‘ms’), and some additional SI-prefix seconds-based units. The datetime64 data type also accepts the string “NAT”, in any combination of lowercase/uppercase letters, for a “Not A Time” value.

We have covered some of the basic concepts in NumPy. There are, of course, more complex concepts and operations. However, it is always a good practice to build the knowledge step-by-step from basic to advance.

Thank you for reading.

--

--