27 個Python數據科學庫實戰案例 (附代碼)

本文約8000字，建議閱讀15分鐘本文對目前較為常見的人工智慧庫進行簡要全面的介紹。

為了大家能夠對人工智慧常用的 Python 庫有一個初步的了解，以選擇能夠滿足自己需求的庫進行學習，對目前較為常見的人工智慧庫進行簡要全面的介紹。

1、Numpy

NumPy(Numerical Python)是 Python的一個擴展程序庫，支持大量的維度數組與矩陣運算，此外也針對數組運算提供大量的數學函數庫，Numpy底層使用C語言編寫，數組中直接存儲對象，而不是存儲對象指針，所以其運算效率遠高於純Python代碼。我們可以在示例中對比下純Python與使用Numpy庫在計算列表sin值的速度對比：

import numpy as np
import math
import random
import time


start = time.time()
for i in range(10):
    list_1 = list(range(1,10000))
    for j in range(len(list_1)):
        list_1[j] = math.sin(list_1[j])
print("使用純Python用時{}s".format(time.time()-start))


start = time.time()
for i in range(10):
    list_1 = np.array(np.arange(1,10000))
    list_1 = np.sin(list_1)
print("使用Numpy用時{}s".format(time.time()-start))

從如下運行結果，可以看到使用 Numpy 庫的速度快於純 Python 編寫的代碼：

使用純Python用時0.017444372177124023s
使用Numpy用時0.001619577407836914s

2、OpenCV

OpenCV 是一個的跨平台計算機視覺庫，可以運行在 Linux、Windows 和 Mac OS 作業系統上。它輕量級而且高效——由一系列 C 函數和少量 C++ 類構成，同時也提供了 Python 接口，實現了圖像處理和計算機視覺方面的很多通用算法。下面代碼嘗試使用一些簡單的濾鏡，包括圖片的平滑處理、高斯模糊等：

import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
img = cv.imread('h89817032p0.png')
kernel = np.ones((5,5),np.float32)/25
dst = cv.filter2D(img,-1,kernel)
blur_1 = cv.GaussianBlur(img,(5,5),0)
blur_2 = cv.bilateralFilter(img,9,75,75)
plt.figure(figsize=(10,10))
plt.subplot(221),plt.imshow(img[:,:,::-1]),plt.title('Original')
plt.xticks([]), plt.yticks([])
plt.subplot(222),plt.imshow(dst[:,:,::-1]),plt.title('Averaging')
plt.xticks([]), plt.yticks([])
plt.subplot(223),plt.imshow(blur_1[:,:,::-1]),plt.title('Gaussian')
plt.xticks([]), plt.yticks([])
plt.subplot(224),plt.imshow(blur_1[:,:,::-1]),plt.title('Bilateral')
plt.xticks([]), plt.yticks([])
plt.show()

OpenCV

3、Scikit-image

scikit-image是基於scipy的圖像處理庫，它將圖片作為numpy數組進行處理。例如，可以利用scikit-image改變圖片比例，scikit-image提供了rescale、resize以及downscale_local_mean等函數。

from skimage import data, color, io
from skimage.transform import rescale, resize, downscale_local_mean


image = color.rgb2gray(io.imread('h89817032p0.png'))


image_rescaled = rescale(image, 0.25, anti_aliasing=False)
image_resized = resize(image, (image.shape[0] // 4, image.shape[1] // 4),
                       anti_aliasing=True)
image_downscaled = downscale_local_mean(image, (4, 3))
plt.figure(figsize=(20,20))
plt.subplot(221),plt.imshow(image, cmap='gray'),plt.title('Original')
plt.xticks([]), plt.yticks([])
plt.subplot(222),plt.imshow(image_rescaled, cmap='gray'),plt.title('Rescaled')
plt.xticks([]), plt.yticks([])
plt.subplot(223),plt.imshow(image_resized, cmap='gray'),plt.title('Resized')
plt.xticks([]), plt.yticks([])
plt.subplot(224),plt.imshow(image_downscaled, cmap='gray'),plt.title('Downscaled')
plt.xticks([]), plt.yticks([])
plt.show()

Scikit-image

4、PIL

Python Imaging Library(PIL) 已經成為 Python 事實上的圖像處理標準庫了，這是由於，PIL 功能非常強大，但API卻非常簡單易用。但是由於PIL僅支持到 Python 2.7，再加上年久失修，於是一群志願者在 PIL 的基礎上創建了兼容的版本，名字叫 Pillow，支持最新 Python 3.x，又加入了許多新特性，因此，我們可以跳過 PIL，直接安裝使用 Pillow。

5、Pillow

使用 Pillow 生成字母驗證碼圖片：

from PIL import Image, ImageDraw, ImageFont, ImageFilter


import random


# 隨機字母:
def rndChar():
    return chr(random.randint(65, 90))


# 隨機顏色1:
def rndColor():
    return (random.randint(64, 255), random.randint(64, 255), random.randint(64, 255))


# 隨機顏色2:
def rndColor2():
    return (random.randint(32, 127), random.randint(32, 127), random.randint(32, 127))


# 240 x 60:
width = 60 * 6
height = 60 * 6
image = Image.new('RGB', (width, height), (255, 255, 255))
# 創建Font對象:
font = ImageFont.truetype('/usr/share/fonts/wps-office/simhei.ttf', 60)
# 創建Draw對象:
draw = ImageDraw.Draw(image)
# 填充每個像素:
for x in range(width):
    for y in range(height):
        draw.point((x, y), fill=rndColor())
# 輸出文字:
for t in range(6):
    draw.text((60 * t + 10, 150), rndChar(), font=font, fill=rndColor2())
# 模糊:
image = image.filter(ImageFilter.BLUR)
image.save('code.jpg', 'jpeg')

驗證碼

6、SimpleCV

SimpleCV 是一個用於構建計算機視覺應用程式的開源框架。使用它，可以訪問高性能的計算機視覺庫，如 OpenCV，而不必首先了解位深度、文件格式、顏色空間、緩衝區管理、特徵值或矩陣等術語。但其對於 Python3 的支持很差很差，在 Python3.7 中使用如下代碼：

from SimpleCV import Image, Color, Display
# load an image from imgur
img = Image('http://i.imgur.com/lfAeZ4n.png')
# use a keypoint detector to find areas of interest
feats = img.findKeypoints()
# draw the list of keypoints
feats.draw(color=Color.RED)
# show the  resulting image. 
img.show()
# apply the stuff we found to the image.
output = img.applyLayers()
# save the results.
output.save('juniperfeats.png')

會報如下錯誤，因此不建議在 Python3 中使用：

SyntaxError: Missing parentheses in call to 'print'. Did you mean print('unit test')?

7、Mahotas

Mahotas 是一個快速計算機視覺算法庫，其構建在 Numpy 之上，目前擁有超過100種圖像處理和計算機視覺功能，並在不斷增長。使用 Mahotas 加載圖像，並對像素進行操作：

import numpy as np
import mahotas
import mahotas.demos


from mahotas.thresholding import soft_threshold
from matplotlib import pyplot as plt
from os import path
f = mahotas.demos.load('lena', as_grey=True)
f = f[128:,128:]
plt.gray()
# Show the data:
print("Fraction of zeros in original image: {0}".format(np.mean(f==0)))
plt.imshow(f)
plt.show()

Mahotas

8、Ilastik

Ilastik 能夠給用戶提供良好的基於機器學習的生物信息圖像分析服務，利用機器學習算法，輕鬆地分割，分類，跟蹤和計數細胞或其他實驗數據。大多數操作都是交互式的，並不需要機器學習專業知識。

9、Scikit-Learn

Scikit-learn 是針對 Python 程式語言的免費軟體機器學習庫。它具有各種分類，回歸和聚類算法，包括支持向量機，隨機森林，梯度提升，k均值和 DBSCAN 等多種機器學習算法。使用Scikit-learn實現KMeans算法：

import time


import numpy as np
import matplotlib.pyplot as plt


from sklearn.cluster import MiniBatchKMeans, KMeans
from sklearn.metrics.pairwise import pairwise_distances_argmin
from sklearn.datasets import make_blobs


# Generate sample data
np.random.seed(0)


batch_size = 45
centers = [[1, 1], [-1, -1], [1, -1]]
n_clusters = len(centers)
X, labels_true = make_blobs(n_samples=3000, centers=centers, cluster_std=0.7)


# Compute clustering with Means


k_means = KMeans(init='k-means++', n_clusters=3, n_init=10)
t0 = time.time()
k_means.fit(X)
t_batch = time.time() - t0


# Compute clustering with MiniBatchKMeans


mbk = MiniBatchKMeans(init='k-means++', n_clusters=3, batch_size=batch_size,
                      n_init=10, max_no_improvement=10, verbose=0)
t0 = time.time()
mbk.fit(X)
t_mini_batch = time.time() - t0


# Plot result
fig = plt.figure(figsize=(8, 3))
fig.subplots_adjust(left=0.02, right=0.98, bottom=0.05, top=0.9)
colors = ['#4EACC5', '#FF9C34', '#4E9A06']


# We want to have the same colors for the same cluster from the
# MiniBatchKMeans and the KMeans algorithm. Let's pair the cluster centers per
# closest one.
k_means_cluster_centers = k_means.cluster_centers_
order = pairwise_distances_argmin(k_means.cluster_centers_,
                                  mbk.cluster_centers_)
mbk_means_cluster_centers = mbk.cluster_centers_[order]


k_means_labels = pairwise_distances_argmin(X, k_means_cluster_centers)
mbk_means_labels = pairwise_distances_argmin(X, mbk_means_cluster_centers)


# KMeans
for k, col in zip(range(n_clusters), colors):
    my_members = k_means_labels == k
    cluster_center = k_means_cluster_centers[k]
    plt.plot(X[my_members, 0], X[my_members, 1], 'w',
            markerfacecolor=col, marker='.')
    plt.plot(cluster_center[0], cluster_center[1], 'o', markerfacecolor=col,
            markeredgecolor='k', markersize=6)
plt.title('KMeans')
plt.xticks(())
plt.yticks(())


plt.show()

KMeans

10、SciPy

SciPy 庫提供了許多用戶友好和高效的數值計算，如數值積分、插值、優化、線性代數等。SciPy 庫定義了許多數學物理的特殊函數，包括橢圓函數、貝塞爾函數、伽馬函數、貝塔函數、超幾何函數、拋物線圓柱函數等等。

from scipy import special
import matplotlib.pyplot as plt
import numpy as np


def drumhead_height(n, k, distance, angle, t):
    kth_zero = special.jn_zeros(n, k)[-1]
    return np.cos(t) * np.cos(n*angle) * special.jn(n, distance*kth_zero)


theta = np.r_[0:2*np.pi:50j]
radius = np.r_[0:1:50j]
x = np.array([r * np.cos(theta) for r in radius])
y = np.array([r * np.sin(theta) for r in radius])
z = np.array([drumhead_height(1, 1, r, theta, 0.5) for r in radius])




fig = plt.figure()
ax = fig.add_axes(rect=(0, 0.05, 0.95, 0.95), projection='3d')
ax.plot_surface(x, y, z, rstride=1, cstride=1, cmap='RdBu_r', vmin=-0.5, vmax=0.5)
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_xticks(np.arange(-1, 1.1, 0.5))
ax.set_yticks(np.arange(-1, 1.1, 0.5))
ax.set_zlabel('Z')
plt.show()

SciPy

11、NLTK

NLTK 是構建Python程序以處理自然語言的庫。它為50多個語料庫和詞彙資源(如 WordNet )提供了易於使用的接口，以及一套用於分類、分詞、詞幹、標記、解析和語義推理的文本處理庫、工業級自然語言處理 (Natural Language Processing, NLP) 庫的包裝器。NLTK被稱為「a wonderful tool for teaching, and working in, computational linguistics using Python」。

import nltk
from nltk.corpus import treebank


# 首次使用需要下載
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('maxent_ne_chunker')
nltk.download('words')
nltk.download('treebank')


sentence = """At eight o'clock on Thursday morning Arthur didn't feel very good."""
# Tokenize
tokens = nltk.word_tokenize(sentence)
tagged = nltk.pos_tag(tokens)


# Identify named entities
entities = nltk.chunk.ne_chunk(tagged)


# Display a parse tree
t = treebank.parsed_sents('wsj_0001.mrg')[0]
t.draw()

NLTK

12、spaCy

spaCy 是一個免費的開源庫，用於 Python 中的高級 NLP。它可以用於構建處理大量文本的應用程式；也可以用來構建信息提取或自然語言理解系統，或者對文本進行預處理以進行深度學習。

  import spacy


  texts = [
      "Net income was $9.4 million compared to the prior year of $2.7 million.",
      "Revenue exceeded twelve billion dollars, with a loss of $1b.",
  ]


  nlp = spacy.load("en_core_web_sm")
  for doc in nlp.pipe(texts, disable=["tok2vec", "tagger", "parser", "attribute_ruler", "lemmatizer"]):
      # Do something with the doc here
      print([(ent.text, ent.label_) for ent in doc.ents])

nlp.pipe 生成 Doc 對象，因此我們可以對它們進行疊代並訪問命名實體預測：

[('$9.4 million', 'MONEY'), ('the prior year', 'DATE'), ('$2.7 million', 'MONEY')]
[('twelve billion dollars', 'MONEY'), ('1b', 'MONEY')]

13、LibROSA

librosa 是一個用於音樂和音頻分析的 Python 庫，它提供了創建音樂信息檢索系統所必需的功能和函數。

# Beat tracking example
import librosa


# 1. Get the file path to an included audio example
filename = librosa.example('nutcracker')


# 2. Load the audio as a waveform `y`
#    Store the sampling rate as `sr`
y, sr = librosa.load(filename)


# 3. Run the default beat tracker
tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr)
print('Estimated tempo: {:.2f} beats per minute'.format(tempo))


# 4. Convert the frame indices of beat events into timestamps
beat_times = librosa.frames_to_time(beat_frames, sr=sr)

14、Pandas

Pandas 是一個快速、強大、靈活且易於使用的開源數據分析和操作工具， Pandas 可以從各種文件格式比如 CSV、JSON、SQL、Microsoft Excel 導入數據，可以對各種數據進行運算操作，比如歸併、再成形、選擇，還有數據清洗和數據加工特徵。Pandas 廣泛應用在學術、金融、統計學等各個數據分析領域。

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np


ts = pd.Series(np.random.randn(1000), index=pd.date_range("1/1/2000", periods=1000))
ts = ts.cumsum()


df = pd.DataFrame(np.random.randn(1000, 4), index=ts.index, columns=list("ABCD"))
df = df.cumsum()
df.plot()
plt.show()

Pandas

15、Matplotlib

Matplotlib 是Python的繪圖庫，它提供了一整套和 matlab 相似的命令 API，可以生成出版質量級別的精美圖形，Matplotlib 使繪圖變得非常簡單，在易用性和性能間取得了優異的平衡。使用 Matplotlib 繪製多曲線圖：

# plot_multi_curve.py
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0.1, 2 * np.pi, 100)
y_1 = x
y_2 = np.square(x)
y_3 = np.log(x)
y_4 = np.sin(x)
plt.plot(x,y_1)
plt.plot(x,y_2)
plt.plot(x,y_3)
plt.plot(x,y_4)
plt.show()

Matplotlib

16、Seaborn

Seaborn 是在 Matplotlib 的基礎上進行了更高級的API封裝的Python數據可視化庫，從而使得作圖更加容易，應該把 Seaborn 視為 Matplotlib 的補充，而不是替代物。

import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(style="ticks")


df = sns.load_dataset("penguins")
sns.pairplot(df, hue="species")
plt.show()

seaborn

17、Orange

Orange 是一個開源的數據挖掘和機器學習軟體，提供了一系列的數據探索、可視化、預處理以及建模組件。Orange 擁有漂亮直觀的交互式用戶界面，非常適合新手進行探索性數據分析和可視化展示；同時高級用戶也可以將其作為 Python 的一個編程模塊進行數據操作和組件開發。使用 pip 即可安裝 Orange，好評～

$ pip install orange3

安裝完成後，在命令行輸入 orange-canvas 命令即可啟動 Orange 圖形界面：

$ orange-canvas

啟動完成後，即可看到 Orange 圖形界面，進行各種操作。

Orange

18、PyBrain

PyBrain 是 Python 的模塊化機器學習庫。它的目標是為機器學習任務和各種預定義的環境提供靈活、易於使用且強大的算法來測試和比較算法。PyBrain 是 Python-Based Reinforcement Learning, Artificial Intelligence and Neural Network Library 的縮寫。我們將利用一個簡單的例子來展示 PyBrain 的用法，構建一個多層感知器 (Multi Layer Perceptron, MLP)。首先，我們創建一個新的前饋網絡對象：

from pybrain.structure import FeedForwardNetwork
n = FeedForwardNetwork()

接下來，構建輸入、隱藏和輸出層：

from pybrain.structure import LinearLayer, SigmoidLayer


inLayer = LinearLayer(2)
hiddenLayer = SigmoidLayer(3)
outLayer = LinearLayer(1)

為了使用所構建的層，必須將它們添加到網絡中：

n.addInputModule(inLayer)
n.addModule(hiddenLayer)
n.addOutputModule(outLayer)

可以添加多個輸入和輸出模塊。為了向前計算和反向誤差傳播，網絡必須知道哪些層是輸入、哪些層是輸出。這就需要明確確定它們應該如何連接。為此，我們使用最常見的連接類型，全連接層，由 FullConnection 類實現：

from pybrain.structure import FullConnection
in_to_hidden = FullConnection(inLayer, hiddenLayer)
hidden_to_out = FullConnection(hiddenLayer, outLayer)

與層一樣，我們必須明確地將它們添加到網絡中：

n.addConnection(in_to_hidden)
n.addConnection(hidden_to_out)

所有元素現在都已準備就位，最後，我們需要調用.sortModules()方法使MLP可用：

n.sortModules()

這個調用會執行一些內部初始化，這在使用網絡之前是必要的。

19、Milk

MILK(MACHINE LEARNING TOOLKIT) 是 Python 語言的機器學習工具包。它主要是包含許多分類器比如 SVMS、K-NN、隨機森林以及決策樹中使用監督分類法，它還可執行特徵選擇，可以形成不同的例如無監督學習、密切關係傳播和由 MILK 支持的 K-means 聚類等分類系統。使用 MILK 訓練一個分類器：

import numpy as np
import milk
features = np.random.rand(100,10)
labels = np.zeros(100)
features[50:] += .5
labels[50:] = 1
learner = milk.defaultclassifier()
model = learner.train(features, labels)


# Now you can use the model on new examples:
example = np.random.rand(10)
print(model.apply(example))
example2 = np.random.rand(10)
example2 += .5
print(model.apply(example2))

20、TensorFlow

TensorFlow 是一個端到端開源機器學習平台。它擁有一個全面而靈活的生態系統，一般可以將其分為 TensorFlow1.x 和 TensorFlow2.x，TensorFlow1.x 與 TensorFlow2.x 的主要區別在於 TF1.x 使用靜態圖而 TF2.x 使用Eager Mode動態圖。這裡主要使用TensorFlow2.x作為示例，展示在 TensorFlow2.x 中構建卷積神經網絡 (Convolutional Neural Network, CNN)。

import tensorflow as tf


from tensorflow.keras import datasets, layers, models


# 數據加載
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()


# 數據預處理
train_images, test_images = train_images / 255.0, test_images / 255.0


# 模型構建
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10))


# 模型編譯與訓練
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])
history = model.fit(train_images, train_labels, epochs=10, 
                    validation_data=(test_images, test_labels))

21、PyTorch

PyTorch 的前身是 Torch，其底層和 Torch 框架一樣，但是使用 Python 重新寫了很多內容，不僅更加靈活，支持動態圖，而且提供了 Python 接口。

# 導入庫
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda, Compose
import matplotlib.pyplot as plt


# 模型構建
device = "cuda" if torch.cuda.is_available() else "cpu"
print("Using {} device".format(device))


# Define model
class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
            nn.ReLU()
        )


    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits


model = NeuralNetwork().to(device)


# 損失函數和優化器
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)


# 模型訓練
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)


        # Compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)


        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()


        if batch % 100 == 0:
            loss, current = loss.item(), batch * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

22、Theano

Theano 是一個 Python 庫，它允許定義、優化和有效地計算涉及多維數組的數學表達式，建在 NumPy 之上。在 Theano 中實現計算雅可比矩陣：

import theano
import theano.tensor as T
x = T.dvector('x')
y = x ** 2
J, updates = theano.scan(lambda i, y,x : T.grad(y[i], x), sequences=T.arange(y.shape[0]), non_sequences=[y,x])
f = theano.function([x], J, updates=updates)
f([4, 4])

23、Keras

Keras 是一個用 Python 編寫的高級神經網絡 API，它能夠以 TensorFlow, CNTK, 或者 Theano 作為後端運行。Keras 的開發重點是支持快速的實驗，能夠以最小的時延把想法轉換為實驗結果。

from keras.models import Sequential
from keras.layers import Dense


# 模型構建
model = Sequential()
model.add(Dense(units=64, activation='relu', input_dim=100))
model.add(Dense(units=10, activation='softmax'))


# 模型編譯與訓練
model.compile(loss='categorical_crossentropy',
              optimizer='sgd',
              metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5, batch_size=32)

24、Caffe

在 Caffe2 官方網站上，這樣說道：Caffe2 現在是 PyTorch 的一部分。雖然這些 api 將繼續工作，但鼓勵使用 PyTorch api。

25、MXNet

MXNet 是一款設計為效率和靈活性的深度學習框架。它允許混合符號編程和命令式編程，從而最大限度提高效率和生產力。使用 MXNet 構建手寫數字識別模型：

import mxnet as mx
from mxnet import gluon
from mxnet.gluon import nn
from mxnet import autograd as ag
import mxnet.ndarray as F


# 數據加載
mnist = mx.test_utils.get_mnist()
batch_size = 100
train_data = mx.io.NDArrayIter(mnist['train_data'], mnist['train_label'], batch_size, shuffle=True)
val_data = mx.io.NDArrayIter(mnist['test_data'], mnist['test_label'], batch_size)


# CNN模型
class Net(gluon.Block):
    def __init__(self, **kwargs):
        super(Net, self).__init__(**kwargs)
        self.conv1 = nn.Conv2D(20, kernel_size=(5,5))
        self.pool1 = nn.MaxPool2D(pool_size=(2,2), strides = (2,2))
        self.conv2 = nn.Conv2D(50, kernel_size=(5,5))
        self.pool2 = nn.MaxPool2D(pool_size=(2,2), strides = (2,2))
        self.fc1 = nn.Dense(500)
        self.fc2 = nn.Dense(10)


    def forward(self, x):
        x = self.pool1(F.tanh(self.conv1(x)))
        x = self.pool2(F.tanh(self.conv2(x)))
        # 0 means copy over size from corresponding dimension.
        # -1 means infer size from the rest of dimensions.
        x = x.reshape((0, -1))
        x = F.tanh(self.fc1(x))
        x = F.tanh(self.fc2(x))
        return x
net = Net()
# 初始化與優化器定義
# set the context on GPU is available otherwise CPU
ctx = [mx.gpu() if mx.test_utils.list_gpus() else mx.cpu()]
net.initialize(mx.init.Xavier(magnitude=2.24), ctx=ctx)
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.03})


# 模型訓練
# Use Accuracy as the evaluation metric.
metric = mx.metric.Accuracy()
softmax_cross_entropy_loss = gluon.loss.SoftmaxCrossEntropyLoss()


for i in range(epoch):
    # Reset the train data iterator.
    train_data.reset()
    for batch in train_data:
        data = gluon.utils.split_and_load(batch.data[0], ctx_list=ctx, batch_axis=0)
        label = gluon.utils.split_and_load(batch.label[0], ctx_list=ctx, batch_axis=0)
        outputs = []
        # Inside training scope
        with ag.record():
            for x, y in zip(data, label):
                z = net(x)
                # Computes softmax cross entropy loss.
                loss = softmax_cross_entropy_loss(z, y)
                # Backpropogate the error for one iteration.
                loss.backward()
                outputs.append(z)
        metric.update(label, outputs)
        trainer.step(batch.data[0].shape[0])
    # Gets the evaluation result.
    name, acc = metric.get()
    # Reset evaluation result to initial state.
    metric.reset()
    print('training acc at epoch %d: %s=%f'%(i, name, acc))

26、PaddlePaddle

飛槳 (PaddlePaddle) 以百度多年的深度學習技術研究和業務應用為基礎，集深度學習核心訓練和推理框架、基礎模型庫、端到端開發套件、豐富的工具組件於一體。是中國首個自主研發、功能完備、開源開放的產業級深度學習平台。使用 PaddlePaddle 實現 LeNtet5：

# 導入需要的包
import paddle
import numpy as np
from paddle.nn import Conv2D, MaxPool2D, Linear


## 組網
import paddle.nn.functional as F


# 定義 LeNet 網絡結構
class LeNet(paddle.nn.Layer):
    def __init__(self, num_classes=1):
        super(LeNet, self).__init__()
        # 創建卷積和池化層
        # 創建第1個卷積層
        self.conv1 = Conv2D(in_channels=1, out_channels=6, kernel_size=5)
        self.max_pool1 = MaxPool2D(kernel_size=2, stride=2)
        # 尺寸的邏輯：池化層未改變通道數；當前通道數為6
        # 創建第2個卷積層
        self.conv2 = Conv2D(in_channels=6, out_channels=16, kernel_size=5)
        self.max_pool2 = MaxPool2D(kernel_size=2, stride=2)
        # 創建第3個卷積層
        self.conv3 = Conv2D(in_channels=16, out_channels=120, kernel_size=4)
        # 尺寸的邏輯：輸入層將數據拉平[B,C,H,W] -> [B,C*H*W]
        # 輸入size是[28,28]，經過三次卷積和兩次池化之後，C*H*W等於120
        self.fc1 = Linear(in_features=120, out_features=64)
        # 創建全連接層，第一個全連接層的輸出神經元個數為64， 第二個全連接層輸出神經元個數為分類標籤的類別數
        self.fc2 = Linear(in_features=64, out_features=num_classes)
    # 網絡的前向計算過程
    def forward(self, x):
        x = self.conv1(x)
        # 每個卷積層使用Sigmoid激活函數，後面跟著一個2x2的池化
        x = F.sigmoid(x)
        x = self.max_pool1(x)
        x = F.sigmoid(x)
        x = self.conv2(x)
        x = self.max_pool2(x)
        x = self.conv3(x)
        # 尺寸的邏輯：輸入層將數據拉平[B,C,H,W] -> [B,C*H*W]
        x = paddle.reshape(x, [x.shape[0], -1])
        x = self.fc1(x)
        x = F.sigmoid(x)
        x = self.fc2(x)
        return x

27、CNTK

CNTK(Cognitive Toolkit) 是一個深度學習工具包，通過有向圖將神經網絡描述為一系列計算步驟。在這個有向圖中，葉節點表示輸入值或網絡參數，而其他節點表示對其輸入的矩陣運算。CNTK 可以輕鬆地實現和組合流行的模型類型，如 CNN 等。CNTK 用網絡描述語言 (network description language, NDL) 描述一個神經網絡。簡單的說，要描述輸入的 feature，輸入的 label，一些參數，參數和輸入之間的計算關係，以及目標節點是什麼。

NDLNetworkBuilder=[
    
    run=ndlLR
    
    ndlLR=[
      # sample and label dimensions
      SDim=$dimension$
      LDim=1
    
      features=Input(SDim, 1)
      labels=Input(LDim, 1)
    
      # parameters to learn
      B0 = Parameter(4) 
      W0 = Parameter(4, SDim)
      
      
      B = Parameter(LDim)
      W = Parameter(LDim, 4)
    
      # operations
      t0 = Times(W0, features)
      z0 = Plus(t0, B0)
      s0 = Sigmoid(z0)   
      
      t = Times(W, s0)
      z = Plus(t, B)
      s = Sigmoid(z)    
    
      LR = Logistic(labels, s)
      EP = SquareError(labels, s)
    
      # root nodes
      FeatureNodes=(features)
      LabelNodes=(labels)
      CriteriaNodes=(LR)
      EvalNodes=(EP)
      OutputNodes=(s,t,z,s0,W0)
    ]