Student Interviews: Ian Myungsu Choi – Center for Knowledge-Powered Interdisciplinary Data Science (CKIDS)

Ian Myungsu Choi majored in chemistry in undergraduate and is currently pursuing a master’s degree in Applied Data Science at USC. However, great experiences that he had at work triggered his curiosity about optimizing and improving product development processes. He is fascinated by the unbelievable power and working speed of the computer and is looking forward to contributing to the data science community and to witness the blooming era of data science.

What are your undergraduate and graduate majors?

I am in the Applied Data Science program at USC. I majored in chemistry and minored in business administration as an undergraduate.

What was your turning point (event, person, or work) that motivated you to study data science?

I would like to talk about one of the great experiences that I had in my previous job.

Developing a new product involves numerous processes from planning through final shipping stages, each of which requires optimization from abundant data. The problem is that optimizing one process could negatively affect other processes, which is called a trade-off. Due to the number of processes, effectively tracking all the trade-offs is a very difficult problem.

And the other problem was that getting the right feedback from the data for the optimization still involves uncertain interpretation, possible misconceptions, and human-error from diverse engineers, partially due to latent factors that we were unable to quantify, which eventually led to delaying the optimization process. In addition, considering the amount of data, the information extracted was very limited.

These problems led me to wonder whether or not we could automatically build the entire trade-off dependency graph of all the processes for the optimization, and whether or not there would be a systematic way for us to avoid those misconceptions and extract more information and so on.

These curiosities are the main reason for me to make a decision on studying data science.

Have you worked in the field of data science (either a work or a research experience)?

As part of the GRIDS data science student association activities, I recently conducted a data science project about behavioral context recognition working with another student. Behavioral context is, simply put, background information of certain activities of people such as sound, location, movement, and so on.

Our motivation was that if we know the behavioral context from the sensory data of a smartphone, then we might trace the variation of the product value as to the specific context since the value can be different in different contexts. The dataset consists of several categories of sensory data from the smartphone, such as the accelerometer, the gyroscope sensor, the magnetic sensor, the audio sensor, and the GPS sensor. This dataset is originally from the Department of Electrical and Computer Engineering of the University of California San Diego. During the project, we were able to predict people’s activities based on sensory data and find some interesting patterns. There is a lot of potential for further work such as building the Bayesian network and context-recognition recommender systems. I would like to continue to work on these topics in the future.

It was a great experience and we were able to learn many things through this data science project in GRIDS.

Looking back to the beginning of your journey, do you have any advice for students or beginners who want to learn more about data science?

Although I think I am still in no position to give advice to others, if I have to then I might want to say the following.

If one has enough time, then get a good understanding of:

Linear algebra
Calculus
Statistics
Data structures
Algorithms
Programming languages

If not, then:

First, learn about little things that you currently don’t know such as the basics of data structures, programming, and statistics.

Second, find interesting small problems related to data science around you. They do not need to be deeply important problems. If the problem arouses curiosity and interest from you, then that is enough.

Third, conduct a project for solving the problem. From the beginning of the project, we might have to learn new algorithms and models that are needed to solve the problem. During the project, we can have hands-on experiences about the pros and cons of each of the algorithms that we apply. Also, this project can give us an opportunity to think more deeply about the algorithms. In addition, we will face many other unexpected challenges to be solved, to be learned. This can lead to another project, and to further learning.

Most importantly, finish the project and get tangible results. We should not leave the project unfinished. If it is hard to finish, then narrow down the scope of the project.

Repeat the loop of steps above, and you will extend our data science knowledge and skills.

How will you apply your skills to solve real-world problems? Why do you care about solving this problem?

Problem 1.

These days, except for the problems that I mentioned above, I am curious if patterns of variations accumulating/releasing, etc] of certain emotions toward the characters and the events in a story [e.g., novel, scenario, movie script, etc.] could be good features of predicting whether or not the story would get a good rating from the readers. Since those emotions, as well as sentiment, can be quantified to some extent by natural language processing and there is a lot of data available for the ratings of many stories, this experiment can be performed. And this might lead to finding interesting similarities of those emotional trajectories among the popular stories through clustering. In addition, if we are able to be successful to quantify the factors of attention from the readers to the story, then that would be better for predicting since managing/keeping attention from the readers towards the story is essential.

I think this can be helpful for authors to evaluate their stories during/after writing. It can also be helpful for those who are interested in creating further content based on the evaluation of the story since the story is the fundamental structure for new content.

Problem 2.

Recently, a growing problem in Korea is the noise that permeates across the rooms in the buildings where people live. One can easily find news articles about crimes resulting from these noise problems. Like the case of which young children can easily hurt their friends if they did not learn and recognize that other people can feel pain from the violence from themselves, we might continuously happen to make loud noises without any concerns about others if we cannot recognize how much others are experiencing inconvenience from those noises due to the fact that all the rooms are separated by the walls. This can result in growing tensions among neighbors. If we add decibel sensors into the smoke sensors in a room, since almost all rooms have a smoke sensor and since a decibel sensor is small enough to be added , we can gather data about the noise from the rooms. Then, by using this data, one might make a system that can give feedback about the noise to each tenant through their smartphone before such a noise becomes problematic. And also, this accumulated data can be additional information for those who want to find a room to move into.

I think this might be effective for solving this problem, and very helpful for both tenants and building managers.

Why do more people need to study data science?

We don’t have wings like birds, but we can fly faster than birds using an airplane.

We don’t have legs like a cheetah, but we can move faster than a cheetah using a car.

Humans can do more things because we know how to use tools and knowledge.

Now we have the most powerful tool, that is, the computer.

A computer does not feel tired and does not make mistakes.

The working speed of the computer is unbelievable.

This remarkable tool can learn from data.

If we create a model by learning from data, the learned model can be copied with no delay and no limit.

And this model can be applied very widely.

We are lucky because we are living at the very beginning of these surprising events.

Data science plays a key role in these events.

Then, are there still reasons for us not to learn about data science?

Let’s get started learning data science.

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30