Machine Learning in Astronomy

Machine learning in astronomy

Is astronomy data science?

Machine learning in astronomy - Sure it sounds like an oxymoron, but is that the real case? Machine learning is one of the newest 'sciences', while astronomy is the oldest. In fact, astronomy developed naturally because people realized that studying the stars was not only fascinating, but it also helped them in their daily lives. For example, research into the star cycle helped create calendars (such as the Maya and the Proto-Bulgarian calendar). Also, it played an important role in navigation and orientation.

Of particular importance was the early development of observational analysis using mathematical, geometric, and other scientific methods. It originated with the Babylonians, who laid the foundations for the tradition of astronomers, which will continue in many other civilizations. Since then, data analysis has played a central role in astronomy.

So, after millennia of sophisticated techniques for data analysis, you think any dataset can now present a problem to astronomers, right?

Well ... that's not entirely true. The main problem that astronomers are facing now ... it may seem strange ... advances in technology.

Wait, what ?! How can good technology be a problem? It can most certainly do. Because what I mean by good technology is the large field of view (FOV) of telescopes and the high resolution of detectors. Combining those elements indicates that today's telescopes collect large amounts of data more than previous-generation tech. And it suggests that astronomers must refute the amount of data they have never seen before.

How was the Galaxy Zoo project born?

In 2007, Kevin Schwansky found himself in a similar situation.

As an astronomer at Oxford University, one of his tasks was to classify images of 900,000 galaxies collected by the Sloan Digital Sky Survey over 7 years. He had to look at every single image to see if the galaxy was elliptical or curved and if it was rotating. The action seems like a pretty trivial one. However, the large amount of data made it almost impossible. Why Because it is estimated that a person has to do 2 complete / 7 tasks to complete it in 5-5 years! Talking about a heavy workload! So, after working for a week, Swisswinski and his colleague Chris Lintot decided that there was something better to be done.

The Galaxy Zoo - a civic science project - was born. If this is the first time you've heard of it, civic science means people participate in professional scientific research. In general, Schainsky and Lintot's idea is to distribute images online and hire volunteers to help label the galaxy. And this is possible because the function of identifying a galaxy as a galaxy or spherical is quite straightforward.

Initially, they hoped to contribute 20,000,000,000.

To their surprise, however, more than 1,150,000 people volunteered for the project and the images were categorized in about 2 years. The Galaxy Zoo was a success and followed more projects, such as the Galaxy Zoo Supernova and the Galaxy Zoo Hubble. In fact, to this day there are many active projects.

The use of thousands of volunteers to analyze the data may seem like a success but it shows how much we are suffering right now. In a space of 2 years, 100,000 people were not able to classify (and perform complex analysis in) data collected from just one telescope! And now we're building a hundred, even thousand times more powerful telescopes. That said, in a few years' volunteers are not enough to analyze the huge data we have received.

To prove this, the rule of thumb in astronomy is that the information we collect doubles every year. For example, the Hubble Telescope has been collecting 20 GB of data every week since 1990. And by early 2020, the Large Synoptic Survey Telescope (LST) expects to collect more te0 terabytes of data each night.

But that is nothing compared to the most ambitious project in astronomy - the square kilometer array (SKA). SKA is an official radio telescope that is expected to be completed in Australia and South Africa by 20224. It is expected to produce more than 1 bite per day with 2,000 radio dishes and 2 million low-frequency antennas. This is more than the entire internet for a year, produced in just one day!

Wow, can you imagine !?

With that in mind, it is clear that this monstrous amount of data will not be analyzed by online volunteers. Therefore, researchers are now recruiting a variety of ally-machines.

Why is everyone talking about machine learning?

Big data, machines, new knowledge ... you know where we're going, don't we?