Introduction to Kaggle Kernels

Introduction to Kaggle Kernel


Kaggle is a platform for data science and sharing. You may have heard of some of their competitions, which often have cash prizes. It is also a great place to practice data science and learn from the community.



What are Kaggle kernels?


Kaggle kernels are basically Jupiter notebooks in the browser that run right before your eyes, all for free. Tell me again that in this case, you missed it, because it's so amazing:


Kaggle is a free platform for running Jupiter notebooks in the kernel browser!


This means you can avoid the hassle of setting up a local environment and have a Jupyter notebook environment inside your browser, wherever you have an internet connection anywhere in the world.


Not only that, but the processing power for notebooks also comes from the server in the cloud, not as your local machine, so you can do a lot of data burning and machine learning on the laptop battery!





Kaggle recently upgraded all of their kernels to get more compute power and memory, and extended the length of time you can run the process to extended0 minutes!

Well, Kaggle is talking enough about me about the kernel. Let's see what it really looks like.

Kernels in action


Once we have created an account at kaggle.com, we can choose a dataset to play and a new kernel, or notebook, with just a few clicks.

The dataset we started comes preloaded in that kernel environment, so there is no need to wipe the dataset on the machine and wait for a large dataset on the network.

Of course, you can load more files (up to 1 GB) into the kernel if you still want to.

In our case, we will continue to play with the fashion-mnist dataset. This is a dataset that includes 10 categories of clothing and accessories, pants, bags, heels, shirts, and more. There are k0k training samples and 10k assessment samples. Let's explore the dataset in our Kaggle kernel.

Looking at the dataset, it is rendered in Kaggle as CSV files. The original data was 2 28x2 p pixel grayscale images and they are flattened to be 4 distinct columns in a CSV file. The file also contains columns representing indicators from 0 to the fashion item.


Loading data


Since the dataset is already in the environment and Pandas is already loaded, it can be used to read these .csv files in Panda Data, one for training and one for prediction.


Note that the data is stored in the 'input' directory, one level above.


If you want to follow it, my Kaggle kernel is here:

 https://www.kaggle.com/yufengg/fashion-mnist/


Data exploration


Now that we have the data loaded into the data frame, we can take advantage of all the features that it brings with us, which we had in the previous episode. We will first display through rhead (), and run description () to learn more about the structure of the dataset.


The dataset seems to have already changed.





Data view


Additionally, it would be nice to imagine some of these images, so that the expression of those numbers makes more sense to us than just rows. Let's use matplotlib to see what these images look like.

Here we use the matplotlib. pyplot library, imported as plt, to display the pixel value as an array image.


We can see that these images, although vague, are in fact still identifiable as claimed clothing and accessories.




The Kugel kernel allows us to work in a full set interactive notebook environment in the browser without some setup, and I really want to emphasize that we don't have to install any Python environment configurations or libraries, which is really good. !


You can view the full kernel here:

 https://www.kaggle.com/yufengg/fashion-mnist/



Comments