0 0
Read Time:193 Minute, 26 Second
Numpy Array

NumPy Array

Combining, Converting, Indexing, Slicing, Reshaping

  1. Machine learning data is represented as arrays.
  2. In Python, data is almost universally represented as "NumPy arrays".

Combining Arrays

Vertical Stack

Given two or more existing arrays, you can stack them vertically using the vstack() function.
In [116]:
from numpy import array
from numpy import vstack

a1 = array([[1,2,3],
           [4,5,6],
           [7,8,9]])

print("numpy array a1\n", a1)

a2 = array([[10,11,12],
           [13,14,15],
           [16,17,18]])

print("numpy array a2\n", a2)

a3 = vstack((a1, a2))
print("Vertical Stack a1, a2\n", a3)
print("Vertical Stack a1, a2 Shape\n", a3.shape)
numpy array a1
 [[1 2 3]
 [4 5 6]
 [7 8 9]]
numpy array a2
 [[10 11 12]
 [13 14 15]
 [16 17 18]]
Vertical Stack a1, a2
 [[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]
 [13 14 15]
 [16 17 18]]
Vertical Stack a1, a2 Shape
 (6, 3)

Horizontal Stack

Given two or more existing arrays, you can stack them horizontally using the hstack() function.
In [117]:
from numpy import array
from numpy import hstack

a1 = array([[1,2,3,20],
           [4,5,6,30],
           [7,8,9,40]])

print("numpy array a1\n", a1)

a2 = array([[10,11,12],
           [13,14,15],
           [16,17,18]])

print("numpy array a2\n", a2)

a3 = hstack((a1, a2))
print("Horizontal Stack a1, a2\n", a3)
print("Horizontal Stack a1, a2 Shape\n", a3.shape)
numpy array a1
 [[ 1  2  3 20]
 [ 4  5  6 30]
 [ 7  8  9 40]]
numpy array a2
 [[10 11 12]
 [13 14 15]
 [16 17 18]]
Horizontal Stack a1, a2
 [[ 1  2  3 20 10 11 12]
 [ 4  5  6 30 13 14 15]
 [ 7  8  9 40 16 17 18]]
Horizontal Stack a1, a2 Shape
 (3, 7)

Convert a one-dimensional list of data to an NumPy array

  1. Calling the array() NumPy function.
In [77]:
from numpy import array

# list of data
data = [11, 20, 30, 50]

# numpy array of data
data = array(data)

print(data)
print(type(data))
[11 20 30 50]
<class 'numpy.ndarray'>

Convert a two-dimensional list of data to an NumPy array

  1. Machine learning will have two-dimensional data; each row represents a new observation and each column represents a new feature.
  2. In Python, a two-dimensional data is "list of lists"; each list represents a new observation.
  3. Convert a "list of lists" to a NumPy array by calling the array() function.
In [78]:
from numpy import array

# list of lists
data = [[11,12, 13], 
        [34, 35, 36],
        [55, 66, 77]]

# numpy array of data
data = array(data)

print(data)
print(type(data))
[[11 12 13]
 [34 35 36]
 [55 66 77]]
<class 'numpy.ndarray'>

NumPy Array Indexing

In [79]:
from numpy import array

# define NumPy array; one dimension
data = array([12, 23, 56, 10, 5])

# index NumPy array; one dimension
print(data[0])
print(data[3])
12
10
In [80]:
from numpy import array

# define NumPy array; one dimension
data = array([12, 23, 56, 10, 5])

# negative index NumPy array; the index -1 refers to the last item in the array
print(data[-1])
print(data[-3])
5
56
In [81]:
from numpy import array

# define NumPy array; two dimensions
data = array([[11,12,45],
              [34, 10, 67],
              [87, 9, 20]])

# index NumPy array; two dimensions
print(data[0,0])
11
In [82]:
from numpy import array

# define NumPy array; two dimensions
data = array([[11,12,45],
              [34, 10, 67],
              [87, 9, 20]])

# index all items in the first row
print("all items in the first row", data[0,:])

# index all items in the first column
print("all items in the first column", data[:,0])
all items in the first row [11 12 45]
all items in the first column [11 34 87]

NumPy Array Slicing

  1. This is most useful in machine learning when specifying input variables and output variables, or splitting training rows from testing rows.
  2. data[from:to]; The slice extends from the "from" index and ends one item before the "to" index.

One-Dimensional Slicing

In [83]:
from numpy import array

# slice of a one-dimensinaol array

# define NumPy array; one-dimensinaol array
data = array([90, 45, 78, 43, 1, 10])

# a slice that starts at index 1 and ends at one item before the index 5.
print(data[1:5])
[45 78 43  1]
In [84]:
from numpy import array

# negative slice of a one-dimensinaol array
# define NumPy array; one-dimensinaol array
data = array([90, 45, 78, 43, 1, 10])

# starting the slice at -3 (the third last item) and ends at one item before the index -1. 
print(data[-3:-1])
[43  1]
In [85]:
from numpy import array

# negative slice of an one-dimensinaol array
# define NumPy array; one-dimensinaol array
data = array([90, 45, 78, 43, 1, 10])

# starting the slice at -3 (the third last item) 
# not specifying a "to" index; takes the slice to the end of the dimension.
print(data[-3:])
[43  1 10]

Two-Dimensional Slicing

  1. Machine learning is common to split your loaded data into input variables (X) and the output variable (y).
  2. Machine learning is common to split a loaded dataset into separate train and test sets.
In [86]:
# split your loaded data into input variables (X) and the output variable (y).
from numpy import array

# define NumPy array; two dimensions
data = array([[11,12,45],
              [34, 10, 67],
              [87, 9, 20]]) 

# input variables (X): slicing all rows and all columns up to, but before the last column
X = data[:, :-1]  

# output variable (y):  slicing all rows and the last column.
y = data[:, -1]

print("Input variables (X)\n",  X)
print("Output variable (y)\n",  y)
Input variables (X)
 [[11 12]
 [34 10]
 [87  9]]
Output variable (y)
 [45 67 20]
In [87]:
# split a loaded dataset into separate train and test sets.
from numpy import array

# define NumPy array; two dimensions
data = array([[11,12,45],
              [34, 10, 67],
              [87, 9, 20],
              [2, 31, 56],
              [1, 43, 5],
              [36, 21, 9],
              [20, 4, 1]]) 

# split point
split = 5

# training data: 
# slicing all rows from the beginning to the split point; ends at one row before the row 5.
# slicing all columns.
train = data[:split, :] 

# test data
# slicing all rows starting from the split point; the row 5; to the end of the dimension.
# slicing all columns.
test = data[split:, :]

print("training data\n", train)
print("test data\n", test)
training data
 [[11 12 45]
 [34 10 67]
 [87  9 20]
 [ 2 31 56]
 [ 1 43  5]]
test data
 [[36 21  9]
 [20  4  1]]

Array Reshaping

  1. After slicing your data, you may need to reshape it.
  2. It is important to know how to reshape your NumPy arrays so that your data meets the expectation of specific Python libraries; the Long Short-Term Memory recurrent neural network in Keras, require input to be specified as a three-dimensional array comprised of samples, timesteps, and features.

Data Shape

NumPy arrays have a "shape" attribute that returns a tuple of the length of each dimension of the array.
In [88]:
from numpy import array

# define NumPy array; one-dimensinaol array
data = array([90, 45, 78, 43, 1, 10])

# accessing "shape" for an one-dimensional array.
print(data.shape)
(6,)
In [89]:
from numpy import array

# define NumPy array; two dimensions
data = array([[11,12,45],
              [34, 10, 67],
              [87, 9, 20],
              [2, 31, 56],
              [1, 43, 5],
              [36, 21, 9],
              [20, 4, 1]]) 

# accessing "shape" for a two-dimensional array.
print(data.shape)
(7, 3)
In [90]:
from numpy import array

# define NumPy array; two dimensions
data = array([[11,12,45],
              [34, 10, 67],
              [87, 9, 20],
              [2, 31, 56],
              [1, 43, 5],
              [36, 21, 9],
              [20, 4, 1]]) 

# shape[0]: the number of rows
# shape[1]: the number of columns
print("The number of rows: %d" % data.shape[0])
print("The number of columns: %d" % data.shape[1])
The number of rows: 7
The number of columns: 3

Reshape 1D to 2D Array

  1. Reshape a one-dimensional array into a two-dimensional array with multiple rows and one column.
In [91]:
from numpy import array

# define NumPy array; one-dimensinaol array with 6 rows.
data = array([90, 45, 78, 43, 1, 10])
print("Before reshape; one-dimensinaol array with 6 rows. ")
print(data)
print(data.shape)

# Reshape a one-dimensional array into a two-dimensional array with 6 rows and 1 column.
data = data.reshape((data.shape[0], 1))
print("\n After reshape; two-dimensional array with 6 rows and 1 column.")
print(data)
print(data.shape)
Before reshape; one-dimensinaol array with 6 rows. 
[90 45 78 43  1 10]
(6,)

 After reshape; two-dimensional array with 6 rows and 1 column.
[[90]
 [45]
 [78]
 [43]
 [ 1]
 [10]]
(6, 1)

Reshape 2D to 3D Array

  1. It is common to need to reshape two-dimensional data where each row represents a sequence into a three-dimensional array for algorithms that expect multiple samples of one or more time steps and one or more features.
In [31]:
from numpy import array

# define NumPy array; two dimensions
data = array([[11,12,45],
              [34, 10, 67],
              [87, 9, 20],
              [2, 31, 56],
              [1, 43, 5],
              [36, 21, 9],
              [20, 4, 1]]) 

print("Before reshape:")
print("two-dimensinaol array with 7 rows, 3 columns.")
print(data.shape)
print(data)

# reshape two-dimensional data into a three-dimensional array
# three-dimensional array can provide multiple samples.
# three-dimensional array can provide 7 samples; each sample with 3 rows and 1 column.
# In the LSTM recurrent neural network model: 
# shape[0] : the number of samples
# shape[1] : the number of time steps
# n : fix the number of features; n = 1
n = 1
data = data.reshape((data.shape[0], data.shape[1], n))

print("\nAfter reshape:")
print("three-dimensional array with the number of 3 rows and 1 column array is 7.")
print("three-dimensional array can provide 7 samples")
print("each sample with 3 rows and 1 column.")
print(data.shape) 
print(data)
Before reshape:
two-dimensinaol array with 7 rows, 3 columns.
(7, 3)
[[11 12 45]
 [34 10 67]
 [87  9 20]
 [ 2 31 56]
 [ 1 43  5]
 [36 21  9]
 [20  4  1]]

After reshape:
three-dimensional array with the number of 3 rows and 1 column array is 7.
three-dimensional array can provide 7 samples
each sample with 3 rows and 1 column.
(7, 3, 1)
[[[11]
  [12]
  [45]]

 [[34]
  [10]
  [67]]

 [[87]
  [ 9]
  [20]]

 [[ 2]
  [31]
  [56]]

 [[ 1]
  [43]
  [ 5]]

 [[36]
  [21]
  [ 9]]

 [[20]
  [ 4]
  [ 1]]]

Reshape 3D to 4D Array

In [32]:
from numpy import array

# define NumPy array; two dimensions
data = array([[[11,12,45],
              [34, 10,67],
              [87, 9,20],
              [2, 31,56],
              [1, 43,5],
              [36, 21,9],
              [20, 4,1]]]) 

print("Before reshape: three-dimensinaol array")
print(data.shape)
print(data)

n = 1
data = data.reshape((data.shape[0], data.shape[1], data.shape[2],n)) 

print("\nAfter reshape: four-dimensional array")
print(data.shape) 
print(data)
Before reshape: three-dimensinaol array
(1, 7, 3)
[[[11 12 45]
  [34 10 67]
  [87  9 20]
  [ 2 31 56]
  [ 1 43  5]
  [36 21  9]
  [20  4  1]]]

After reshape: four-dimensional array
(1, 7, 3, 1)
[[[[11]
   [12]
   [45]]

  [[34]
   [10]
   [67]]

  [[87]
   [ 9]
   [20]]

  [[ 2]
   [31]
   [56]]

  [[ 1]
   [43]
   [ 5]]

  [[36]
   [21]
   [ 9]]

  [[20]
   [ 4]
   [ 1]]]]
In [33]:
from numpy import array

# define NumPy array; two dimensions
data = array([[[11,12,45],
              [34, 10,67],
              [87, 9,20],
              [2, 31,56],
              [1, 43,5],
              [36, 21,9],
              [20, 4,1]],
              [[1,2,3],
              [4,5,6],
              [7,8,9],
              [10,11,12],
              [13,14,15],
              [36, 21,9],
              [20, 4,1]]]) 

print("Before reshape: three-dimensinaol array")
print(data.shape)
print(data)

n = 1
data = data.reshape((data.shape[0], data.shape[1], data.shape[2],n))

print("\nAfter reshape: four-dimensional array")
print(data.shape) 
print(data)
Before reshape: three-dimensinaol array
(2, 7, 3)
[[[11 12 45]
  [34 10 67]
  [87  9 20]
  [ 2 31 56]
  [ 1 43  5]
  [36 21  9]
  [20  4  1]]

 [[ 1  2  3]
  [ 4  5  6]
  [ 7  8  9]
  [10 11 12]
  [13 14 15]
  [36 21  9]
  [20  4  1]]]

After reshape: four-dimensional array
(2, 7, 3, 1)
[[[[11]
   [12]
   [45]]

  [[34]
   [10]
   [67]]

  [[87]
   [ 9]
   [20]]

  [[ 2]
   [31]
   [56]]

  [[ 1]
   [43]
   [ 5]]

  [[36]
   [21]
   [ 9]]

  [[20]
   [ 4]
   [ 1]]]


 [[[ 1]
   [ 2]
   [ 3]]

  [[ 4]
   [ 5]
   [ 6]]

  [[ 7]
   [ 8]
   [ 9]]

  [[10]
   [11]
   [12]]

  [[13]
   [14]
   [15]]

  [[36]
   [21]
   [ 9]]

  [[20]
   [ 4]
   [ 1]]]]
In [ ]:
 

About Post Author

方俊贤; Ken Fang

专利号: 201910652769.4; 一种深度学习的算法, 预测微服务持续发布、持续部署后对产品整体质量的影响, 获得国家知识财产局专利; 符合专利法实施细则第 44 条的规定。
Happy
Happy
0 %
Sad
Sad
0 %
Excited
Excited
0 %
Sleepy
Sleepy
0 %
Angry
Angry
0 %
Surprise
Surprise
0 %

Average Rating

5 Star
0%
4 Star
0%
3 Star
0%
2 Star
0%
1 Star
0%

发表回复

您的电子邮箱地址不会被公开。 必填项已用*标注

此站点使用Akismet来减少垃圾评论。了解我们如何处理您的评论数据