使用TPU在GCP上进行Keras / Tensorflow培训

最后发布: 2019-11-15 14:37:59


问题

我正在尝试使用keras和tensorflow 1.15在GCP上训练模型。 从现在开始,我的代码类似于我可以在colab上执行的代码:

# TPUs
import tensorflow as tf
print(tf.__version__)
cluster_resolver = tf.distribute.cluster_resolver.TPUClusterResolver("tpu-name")
tf.config.experimental_connect_to_cluster(cluster_resolver)
tf.tpu.experimental.initialize_tpu_system(cluster_resolver)
tpu_strategy = tf.distribute.experimental.TPUStrategy(cluster_resolver)
print("Number of accelerators: ", tpu_strategy.num_replicas_in_sync)


import numpy as np


np.random.seed(123)  # for reproducibility
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten
from tensorflow.keras.layers import Convolution2D, MaxPooling2D, Input
from tensorflow.keras import utils
from tensorflow.keras.datasets import mnist, cifar10
from tensorflow.keras.models import Model

# 4. Load data into train and test sets
(X_train, y_train) = load_data(sets="gs://BUCKETS/dogscats/train/",target_size=img_size)
(X_test, y_test) =  load_data(sets="gs://BUCKETS/dogscats/valid/",target_size=img_size)
print(X_train.shape, X_test.shape)

# 5. Preprocess input data
#X_train = X_train.reshape(X_train.shape[0], 28, 28, 1)
#X_test = X_test.reshape(X_test.shape[0], 28, 28,1)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255.0
X_test /= 255.0

print(y_train.shape, y_test.shape)
# 6. Preprocess class labels One hot encoding
Y_train = utils.to_categorical(y_train, 2)
Y_test = utils.to_categorical(y_test, 2)
print(Y_train.shape, Y_test.shape)

with tpu_strategy.scope():
  model = make_model((img_size, img_size, 3))
  # 8. Compile model
  model.compile(loss='categorical_crossentropy',
                optimizer="sgd",
                metrics=['accuracy'])

model.summary()

batch_size = 1250 * tpu_strategy.num_replicas_in_sync
# 9. Fit model on training data
model.fit(X_train, Y_train, steps_per_epoch=len(X_train)//batch_size,  
            epochs=5, verbose=1)

但是我的数据在存储桶中,我的代码在虚拟机上。 那我该怎么办? 我尝试使用“ gs:// BUCKETS”加载数据,但是它不起作用。 我该怎么办 ? 编辑:我添加我的代码以加载数据,我对不起。

def load_data(sets="dogcats/train/", k = 5000, target_size=250):
  # define location of dataset
  folder = sets
  photos, labels = list(), list()
  # determine class
  output = 0.0
  for i, dog in enumerate(listdir(folder + "dogs/")):
    if i >= k:
      break
    # load image
    photo = load_img(folder + "dogs/" +dog, target_size=(target_size, target_size))
    # convert to numpy array
    photo = img_to_array(photo)
    # store
    photos.append(photo)
    labels.append(output)

  output = 1.0

  for i, cat in enumerate(listdir(folder + "cats/") ):
    if i >= k:
      break
    # load image
    photo = load_img(folder + "cats/"+cat, target_size=(target_size, target_size))
    # convert to numpy array
    photo = img_to_array(photo)
    # store
    photos.append(photo)
    labels.append(output)

  # convert to a numpy arrays
  photos = asarray(photos)
  labels = asarray(labels)
  print(photos.shape, labels.shape)
  photos, labels = shuffle(photos, labels, random_state=0)
  return photos, labels
tensorflow keras google-cloud-platform bucket tpu
回答

(X_train, y_train) = load_data(sets="gs://BUCKETS/dogscats/train/",target_size=img_size)
(X_test, y_test) =  load_data(sets="gs://BUCKETS/dogscats/valid/",target_size=img_size)

这显然是行不通的,因为从本质上讲,您所做的所有操作都设置了字符串。 您需要做的是将此数据下载为字符串,然后使用它。

首先安装软件包pip install google-cloud-storagepip3 install google-cloud-storage

点-> Python

pip3-> Python3

看看这个 ,你需要一个服务帐户,从你的代码GCP互动。 用于身份验证

当您将服务帐户作为json获取时,您需要执行以下两项操作之一:

将其设置为环境变量: export GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/[FILE_NAME].json"

或我更喜欢的解决方法

gcloud auth activate-service-account \
  <repalce-with-email-from-json-file> \
          --key-file=<path/to/your/json/file> --project=<name-of-your-gcp-project>

现在让我们看看如何使用google-cloud-storage库以字符串形式下载文件:

from google.cloud import storage
client = storage.Client()
bucket = client.get_bucket("BUCKETS")
blob = bucket.get_blob('/dogscats/train/<you-will-need-to-point-to-a-file-and-not-a-directory>')
data = blob.download_as_string()

现在您已经将数据作为字符串,您可以像这样简单地将data传递到加载数据中(X_train, y_train) = load_data(sets=data,target_size=img_size)

听起来很复杂,但是这里是一个快速的伪布局:

  1. 安装google-cloud-storage
  2. 转到Google Cloud Platform控制台-> IAM和管理->服务帐户
  3. 创建具有相对权限的服务帐户(google-cloud-storage)
  4. 下载(JSON)文件,并记住位置。
  5. 激活服务帐号
  6. 将文件下载为字符串并将该字符串传递给您的load_data(data)

希望有帮助!