深度学习 Udacity 课程:Prob 2 assignment 1 (notMNIST)
Deep learning Udacity course: Prob 2 assignment 1 (notMNIST)
看完this and taking the courses, I am struggling to solve the second problem in assignment 1 (notMnist):
Let's verify that the data still looks good. Displaying a sample of the labels and images from the ndarray. Hint: you can use matplotlib.pyplot.
这是我尝试过的:
import random
rand_smpl = [ train_datasets[i] for i in sorted(random.sample(xrange(len(train_datasets)), 1)) ]
print(rand_smpl)
filename = rand_smpl[0]
import pickle
loaded_pickle = pickle.load( open( filename, "r" ) )
image_size = 28 # Pixel width and height.
import numpy as np
dataset = np.ndarray(shape=(len(loaded_pickle), image_size, image_size),
dtype=np.float32)
import matplotlib.pyplot as plt
plt.plot(dataset[2])
plt.ylabel('some numbers')
plt.show()
但这就是我得到的:
这没有多大意义。老实说,我的代码也可能如此,因为我不太确定如何解决这个问题!
泡菜是这样制作的:
image_size = 28 # Pixel width and height.
pixel_depth = 255.0 # Number of levels per pixel.
def load_letter(folder, min_num_images):
"""Load the data for a single letter label."""
image_files = os.listdir(folder)
dataset = np.ndarray(shape=(len(image_files), image_size, image_size),
dtype=np.float32)
print(folder)
num_images = 0
for image in image_files:
image_file = os.path.join(folder, image)
try:
image_data = (ndimage.imread(image_file).astype(float) -
pixel_depth / 2) / pixel_depth
if image_data.shape != (image_size, image_size):
raise Exception('Unexpected image shape: %s' % str(image_data.shape))
dataset[num_images, :, :] = image_data
num_images = num_images + 1
except IOError as e:
print('Could not read:', image_file, ':', e, '- it\'s ok, skipping.')
dataset = dataset[0:num_images, :, :]
if num_images < min_num_images:
raise Exception('Many fewer images than expected: %d < %d' %
(num_images, min_num_images))
print('Full dataset tensor:', dataset.shape)
print('Mean:', np.mean(dataset))
print('Standard deviation:', np.std(dataset))
return dataset
函数的调用方式如下:
dataset = load_letter(folder, min_num_images_per_class)
try:
with open(set_filename, 'wb') as f:
pickle.dump(dataset, f, pickle.HIGHEST_PROTOCOL)
这里的思路是:
Now let's load the data in a more manageable format. Since, depending on your computer setup you might not be able to fit it all in memory, we'll load each class into a separate dataset, store them on disk and curate them independently. Later we'll merge them into a single dataset of manageable size.
We'll convert the entire dataset into a 3D array (image index, x, y) of floating point values, normalized to have approximately zero mean and standard deviation ~0.5 to make training easier down the road.
按以下步骤操作:
#define a function to conver label to letter
def letter(i):
return 'abcdefghij'[i]
# you need a matplotlib inline to be able to show images in python notebook
%matplotlib inline
#some random number in range 0 - length of dataset
sample_idx = np.random.randint(0, len(train_dataset))
#now we show it
plt.imshow(train_dataset[sample_idx])
plt.title("Char " + letter(train_labels[sample_idx]))
您的代码实际上更改了数据集的类型,它不是大小为 (220000, 28,28) 的 ndarray
一般来说,pickle 是一个包含一些对象的文件,而不是数组本身。您应该直接使用 pickle 中的对象来获取训练数据集(使用代码片段中的符号):
#will give you train_dataset and labels
train_dataset = loaded_pickle['train_dataset']
train_labels = loaded_pickle['train_labels']
更新:
根据@gsarmas 的请求,我对整个 Assignment1 的解决方案 link 位于 here。
代码已注释且大部分不言自明,但如有任何疑问,请随时通过您喜欢的任何方式联系 github
请检查此代码
pickle_file = train_datasets[0]
with open(pickle_file, 'rb') as f:
# unpickle
letter_set = pickle.load(f)
# pick a random image index
sample_idx = np.random.randint(len(letter_set))
# extract a 2D slice
sample_image = letter_set[sample_idx, :, :]
plt.figure()
# display it
plt.imshow(sample_image)
使用此代码:
#random select a letter
i = np.random.randint( len(train_datasets) )
plt.title( "abcdefghij"[i] )
#read the file of selected letter
f = open( train_datasets[i], "rb" )
f = pickle.load(f)
#random select an image in the file
j = np.random.randint( len(f) )
#show image
plt.axis('off')
img = plt.imshow( f[ j, :, : ] )
enter image description here
看完this and taking the courses, I am struggling to solve the second problem in assignment 1 (notMnist):
Let's verify that the data still looks good. Displaying a sample of the labels and images from the ndarray. Hint: you can use matplotlib.pyplot.
这是我尝试过的:
import random
rand_smpl = [ train_datasets[i] for i in sorted(random.sample(xrange(len(train_datasets)), 1)) ]
print(rand_smpl)
filename = rand_smpl[0]
import pickle
loaded_pickle = pickle.load( open( filename, "r" ) )
image_size = 28 # Pixel width and height.
import numpy as np
dataset = np.ndarray(shape=(len(loaded_pickle), image_size, image_size),
dtype=np.float32)
import matplotlib.pyplot as plt
plt.plot(dataset[2])
plt.ylabel('some numbers')
plt.show()
但这就是我得到的:
这没有多大意义。老实说,我的代码也可能如此,因为我不太确定如何解决这个问题!
泡菜是这样制作的:
image_size = 28 # Pixel width and height.
pixel_depth = 255.0 # Number of levels per pixel.
def load_letter(folder, min_num_images):
"""Load the data for a single letter label."""
image_files = os.listdir(folder)
dataset = np.ndarray(shape=(len(image_files), image_size, image_size),
dtype=np.float32)
print(folder)
num_images = 0
for image in image_files:
image_file = os.path.join(folder, image)
try:
image_data = (ndimage.imread(image_file).astype(float) -
pixel_depth / 2) / pixel_depth
if image_data.shape != (image_size, image_size):
raise Exception('Unexpected image shape: %s' % str(image_data.shape))
dataset[num_images, :, :] = image_data
num_images = num_images + 1
except IOError as e:
print('Could not read:', image_file, ':', e, '- it\'s ok, skipping.')
dataset = dataset[0:num_images, :, :]
if num_images < min_num_images:
raise Exception('Many fewer images than expected: %d < %d' %
(num_images, min_num_images))
print('Full dataset tensor:', dataset.shape)
print('Mean:', np.mean(dataset))
print('Standard deviation:', np.std(dataset))
return dataset
函数的调用方式如下:
dataset = load_letter(folder, min_num_images_per_class)
try:
with open(set_filename, 'wb') as f:
pickle.dump(dataset, f, pickle.HIGHEST_PROTOCOL)
这里的思路是:
Now let's load the data in a more manageable format. Since, depending on your computer setup you might not be able to fit it all in memory, we'll load each class into a separate dataset, store them on disk and curate them independently. Later we'll merge them into a single dataset of manageable size.
We'll convert the entire dataset into a 3D array (image index, x, y) of floating point values, normalized to have approximately zero mean and standard deviation ~0.5 to make training easier down the road.
按以下步骤操作:
#define a function to conver label to letter
def letter(i):
return 'abcdefghij'[i]
# you need a matplotlib inline to be able to show images in python notebook
%matplotlib inline
#some random number in range 0 - length of dataset
sample_idx = np.random.randint(0, len(train_dataset))
#now we show it
plt.imshow(train_dataset[sample_idx])
plt.title("Char " + letter(train_labels[sample_idx]))
您的代码实际上更改了数据集的类型,它不是大小为 (220000, 28,28) 的 ndarray
一般来说,pickle 是一个包含一些对象的文件,而不是数组本身。您应该直接使用 pickle 中的对象来获取训练数据集(使用代码片段中的符号):
#will give you train_dataset and labels
train_dataset = loaded_pickle['train_dataset']
train_labels = loaded_pickle['train_labels']
更新:
根据@gsarmas 的请求,我对整个 Assignment1 的解决方案 link 位于 here。
代码已注释且大部分不言自明,但如有任何疑问,请随时通过您喜欢的任何方式联系 github
请检查此代码
pickle_file = train_datasets[0]
with open(pickle_file, 'rb') as f:
# unpickle
letter_set = pickle.load(f)
# pick a random image index
sample_idx = np.random.randint(len(letter_set))
# extract a 2D slice
sample_image = letter_set[sample_idx, :, :]
plt.figure()
# display it
plt.imshow(sample_image)
使用此代码:
#random select a letter
i = np.random.randint( len(train_datasets) )
plt.title( "abcdefghij"[i] )
#read the file of selected letter
f = open( train_datasets[i], "rb" )
f = pickle.load(f)
#random select an image in the file
j = np.random.randint( len(f) )
#show image
plt.axis('off')
img = plt.imshow( f[ j, :, : ] )
enter image description here