When your field of study is the known universe and everything in it, you generate a lot of data.
This has never been more true for astronomers as the current generation of telescopes observe not just a single object or tiny patch of sky, but the entire sky for an entire night, night after night after night.
In the past 10 years, projects like the Canadian Hydrogen Intensity Mapping Experiment (CHIME) in British Columbia — in which the University of Toronto is a collaborator — and space telescope missions like the Milky Way galaxy-mapping Gaia and the exoplanet-hunting Kepler have all been part of a “data-driven” revolution in astronomy.
When it begins operation in the 2020s, the SKA will be the largest radio telescope ever built. As it scans the sky, it will generate 600 petabytes of data a year. If you had to store that much data on a typical laptop computer with 500 gigabytes of memory, you’d need a million laptops.
Big data isn’t better data
The age of really big data in astronomy places even greater importance on the tools used to analyze and make sense of this scientific trove. Techniques based on machine learning can handle the classification aspect of the task — sifting through the data to identify asteroids, variable stars, quasars, etc. — but investigating these objects and discovering their true nature still relies on a well-equipped statistical toolbox.
“The LSST will collect terabytes of data,” says Gwen Eadie. “But big data is not necessarily going to answer all your questions. Having big data is great but in order to understand its properties, you need rigorous statistical practices.”
Eadie is an astrostatistician — a rare breed of scientist with one foot firmly in astronomy and the other firmly in statistics.
“The data science revolution has had a deep and rapid impact on academia and industry,” says Radu Craiu, chair of the statistics department. “We have taken a follow-the-data approach and initiated a sustained campaign of joint hires with relevant departments like astronomy and astrophysics who have rich, data-driven research programs.”
According to Ray Carlberg, astronomy and astrophysics chair, “Astronomers are realizing that the best way to handle many aspects of these enormous datasets is to apply more statistical methodology to the problems and invent new methods.
“Moreover, providing high-quality statistical training to our students broadens their career opportunities. Creating this position was a strategic choice that combines research and education.”
Eadie is currently developing an astrostatistics course which will be available to students in both the statistics and astronomy programs.
“When someone once asked me to describe my dream faculty job,” she says, “I told them it would be a joint appointment between statistics and astronomy. So when this job became available, it was the ideal fit for me. I’m super excited to be here.”
“Weighing” the Milky Way galaxy
Even though Eadie took mostly science classes in grade 12 and was good at math, it would have been difficult to forecast a STEM (science, technology, engineering and mathematics) career for her at the time. She had been an ardent figure skater since an early age and after high school joined a Disney on Ice touring company for three years, performing throughout North and South America.
A career in science still wasn’t on the horizon after she left the ice show circuit and started university as an English major. But an introductory astronomy lecture sparked an interest in the field and an English composition she wrote about the Hubble Space Telescope fanned the flames. When a mentor expressed surprise she wasn’t considering a career in science, it was another sign to change programs and take the first step toward becoming an astrostatistician.
That step was followed by a second. “In graduate school, I still hadn’t received any formal training in statistics because it’s typically not in the astronomy or physics curriculum,” says Eadie. “So I attended an astrostatistics summer school where actual statisticians taught statistics for astronomers.
“It helped me realize how important this interdisciplinary approach is.”
It is this approach that Eadie has taken in one of her long-term projects: determining the mass of the Milky Way galaxy — an investigation with ramifications for measuring the amount of dark matter in the galaxy and studying of the evolution of galaxies.
She and her collaborators applied a statistical tool called Bayesian analysis to data for globular clusters — spherical-shaped swarms of tens to hundreds of thousands of stars within our home galaxy. This approach sidestepped the fact that data about the clusters’ motions was incomplete and put the galaxy’s mass at equivalent to 1000 billion suns.
In a more recent paper, the researchers applied this method using data from the Gaia space telescope and refined the estimate to 700 billion suns — a figure largely in agreement with other methods of “weighing” the galaxy.
Eadie hopes this approach will prove to be the definitive method and is as optimistic and confident about the overall promise of astrostatistics.
“When it comes to data analysis in astronomy in the era of big data,” says Eadie, “we will advance knowledge more effectively when we step out of our disciplinary silos and work with people in areas such as statistics, applied mathematics and computer science.
“Not only will this help our science, it will also prevent us from re-inventing the wheel. Interdisciplinary collaboration can even lead to new ways of approaching data analysis, which can in turn can lead to exciting discoveries in many other disciplines.”