Getting Started with Google Colab Notebooks

Uploading files, reading images, and more

Google Colab is a cloud-based service that allows the execution of Python code and includes the ability to use and install new libraries. Put simply, you can leverage Google Colab to create a Jupyter Notebook and execute Python code using either a CPU or a free GPU.

This tutorial helps beginners get started with Google Colab. The sections covered in the tutorial are:

  • Working with Google Colab
  • Uploading files to Google Colab
  • Reading an image
  • Get more RAM
  • For More Information

Let’s get started.

Working with Google Colab

Google Colaboratory (Colab for short) is a cloud computing platform that allows you to run Jupyter notebooks prepared with the most recent ML libraries. These libraries include TensorFlow, Keras, NumPy, scikit-learn, and more. If a library doesn’t exist out-of-the-box, it can be easily installed.

In such notebooks, you can write interactive code, both in Python and other languages—Jupyter supports several different language kernels. In addition to writing code, you can also write text, add images, or include links. So it’s not only about code writing, but also about building interactive documents.

To get started, head over to the Colab home page, where you’ll see an interface with a variety of Notebooks. You can select one of the already-existing examples built by the Google Colab team. Or, if you already have a Jupyter Notebook, you can upload it using the Upload option. Recently-used notebooks are listed in the Recent tab.

Google Colab supports executing both Python 2 and Python 3 notebooks. At the bottom left corner of the above screen, you’ll see that Python 3 is selected by default. Clicking on the down-pointing arrow opens a list in which you can select either Python 2 or Python 3, as shown below.

Once you’ve decided which Python version to use and are ready to create a new Notebook, click on the “NEW PYTHON x NOTEBOOK”. Congratulations! You’ve created a Notebook in Google Colab that should look like this:

Just like in a Google Doc, you can easily change the name of the notebook by simply clicking the text box and editing. The extension of the notebook is ipynb, which stands for Interactive Python NoteBook.

Note that Jupyter Notebooks are an extension of the IPython (Interactive Python) project, and therefore their names are combined in the file extension.

A Jupyter Notebook consists of cells. Within each cell, you can either write code or text. In the top left menu, you can find 2 buttons labeled Code and Text that you can use to switch back and forth.

By default, a new Notebook is created with a single code cell, as shown in the previous figure. If you want to create a new code cell, then click on the Code button. For text cells, you can click on the Text button. You can also insert such cells using the Insert menu.

The text cells allow writing text in Markdown (MD), which is a simple markup language for styling text. The next figure shows a text cell in which there are a heading and a single line in the cell body.

The cell is split into 2 columns. The left column is for writing the text formatted with MD, and the right column is for visualizing the result. There’s a menu above the text cell for adding formatting to the text, adding images, links, lists, and more.

Let’s go back to the code cells. In the above figure, above the text cell, there exists the code cell generated when the Notebook was first created. There are 2 square brackets [] that indicate it’s a code cell. In order to edit the cell, just click on it. For example, the next figure adds a print statement within it.

Note that when you’re outside the text cell, only the output of the MD language is shown. If you want to edit the text cell, double click on it.

When you’re inside the code cell, the previous square brackets are replaced by a button with an arrow pointing to the right. Clicking this button runs the code in this cell, as shown below. It might take a few seconds to run a cell for the first time, as it’s connecting to the Python backend.

After connecting to the backend, the amount of RAM and disk space available for the Notebook is indicated via the bars in the top right of the screen, as shown below. Seeing the green checkmark indicates that the Python backend is connected successfully.

To see the amount of remaining RAM and disk space, hover over the above box. Make sure not to load too much data into RAM, which could cause memory overflow. When the memory is about to overflow, the RAM bar color will be red. To avoid disk overflow, you have to keep an eye on the amount of downloaded data so that it does not exceed the amount of allocated disk space.

If you want to reorder the cells, just click on the cell you need to move up or down. You will see 2 arrows in the top right corner that you can use to reorder the cell. You can also delete a cell by clicking on the trash icon.

In the previous figure, you can also find a button on the left sidebar that allows you to open the left panel of the Notebook, which is expanded below. Within this panel, you can see the table of contents, add code snippets, and view the contents of the disk associated with the Notebook. The table of contents captures the headings added within the text cells.

The Files tab of the left panel is shown below. You have the option to upload a new file to the Notebook, view all files and folders, open a file to view its content, and connect the Notebook to Google Drive using the “MOUNT DRIVE” button.

The REFRESH button helps to force load the recent changes to the disk. Inside the Files tab, the amount of disk space available is represented as a bar at the end of the panel. It’s also displayed as text. The figure below indicates that the available space is 23.56 GB.

By right-clicking a file, a menu appears (as shown below) in which you can download and delete a given file. You can also copy a file’s path. This is helpful if you’re looking to read a file in the Notebook. The path of the sample_data folder is /content/sample_data. If you want to read the mnist_test.csv file, then use this path: /content/sample_data/mnist_test.csv.

Uploading Files to Google Colab

There might be instances where you want to process local files in a Google Colab Notebook. The UPLOAD option will help us upload a file to the Notebook and use it. By clicking on that button, a file chooser will be opened to select the file, as shown below.

By clicking on the Open button, this file will be transferred to Google Colab. It’s expected to see a warning message that tells you that the files uploaded are temporary and will be deleted when the runtime is recycled.

This is troublesome for those with a slow internet connection who need to use a large file. If the file is deleted, they then must take time uploading it again. We’ll examine alternatives later.

The file is uploaded to the directory as shown in the next image. Double-click to open it.

At this point, we should have a baseline of knowledge about Jupyter Notebooks. The next section uploads and reads an image available in the local storage.

Reading an Image

This project targets classifying images. This section will work through a basic step — reading and processing an image. Because there is no image available within the Notebook, we’ll need to upload one. Use the UPLOAD button to upload an image. Right-click on it and chose the “Copy path” option. This path will be used to read the image.

The code listed below is the code cell content that reads the image using Keras and displays it using Matplotlib. The image is read using the Keras load_img() function. This function returns the image as a PIL object. To convert it into a NumPy array, the img_to_array() function is used.

After being converted into a NumPy array, the image shape and data type are printed. Note that the data type of the returned image is float32, despite having pixel values ranging from 0.0 to 255.0. To force the range to be from 0.0 to 1.0, the image is divided by 255.0. The pixel at location (50, 50) is printed before and after the division.

Finally, the image is shown using the imshow() function in Matplotlib.

import numpy
import keras
import matplotlib.pyplot

img = keras.preprocessing.image.load_img("/content/0_100.jpg")
img_array = keras.preprocessing.image.img_to_array(img)

print(img_array.shape, img_array.dtype)

print(img_array[50, 50])

img_array = img_array/255.0

print(img_array[50, 50])

matplotlib.pyplot.imshow(img_array)
matplotlib.pyplot.show()

In order to run the cell, just click on the arrow button on the top left side of the cell. The result of the print messages is given below. The image shape is 100x100x3 and its dtype which is float32. The image is a sample from the strawberry class of the Fruits360 dataset, as shown below:

Get More RAM

Sometimes, the amount of available RAM could be exhausted. In these cases, Colab displays a message asking to limit the memory usage by terminating some sessions, as shown below.

Unfortunately, I have only one session running, and thus I have nothing to do to limit the RAM. A few seconds after the warning appeared, Colab displayed this message “Your session crashed. Automatically restarting” and then the session crashed as shown in the next figure.

Google Colab offers a way to get more RAM. In cases where sessions crash after all RAM is used, Colab offers to double the amount of RAM to be more than 25 GB rather than being around 12.5 GB.

After the session crashes, Colab displays this message “Your session crashed after using all available RAM. Get more RAM” as shown in the next figure. Click on the “Get more RAM” message.

Immediately after clicking on the “Get more RAM” message, a new window appears asking to switch to the high-RAM runtime. Click on YES to increase the amount of RAM after the session restarts. The new RAM will be much faster than the previous RAM.

After the session restarts, you can find that the RAM is doubled to be more than 25 GB, as shown in the next figure.

For More Information

To get more information about Google Colab, check this blog post https://neptune.ai/blog/google-colab-dealing-with-files. There is more information about using Google products like Google Sheets and Google Cloud Storage. This is in addition to integrating Google Colab with AWS S3 and MySQL.

Conclusion

This tutorial started at a very basic level to introduce Google Colab for beginners, as a robust cloud-based tool to execute Python code inside a Jupyter Notebook.

You should now be able to create a new Notebook, upload data from your local PC, download data into the Colab disk, and extend the RAM in cases where the session crashed due to using the entire amount of existing RAM.

Editor’s Note: Heartbeat is a contributor-driven online publication and community dedicated to providing premier educational resources for data science, machine learning, and deep learning practitioners. We’re committed to supporting and inspiring developers and engineers from all walks of life.

Editorially independent, Heartbeat is sponsored and published by Comet, an MLOps platform that enables data scientists & ML teams to track, compare, explain, & optimize their experiments. We pay our contributors, and we don’t sell ads.

If you’d like to contribute, head on over to our call for contributors. You can also sign up to receive our weekly newsletters (Deep Learning Weekly and the Comet Newsletter), join us on Slack, and follow Comet on Twitter and LinkedIn for resources, events, and much more that will help you build better ML models, faster.

Fritz

Our team has been at the forefront of Artificial Intelligence and Machine Learning research for more than 15 years and we're using our collective intelligence to help others learn, understand and grow using these new technologies in ethical and sustainable ways.

Comments 0 Responses

Leave a Reply

Your email address will not be published. Required fields are marked *

wix banner square