USC
USC Viterbi - School of Engineering - Department of Computer Science CSCI 586 Database Systems Interoperability

Project Specifications (link)
Samples: presentation, report


Monday Section


Group 1

Topic: We will work with multiple different datasets such as IMDB top 5000 movies from Kaggle, mine plot keywords and other attributes, so we can answer questions such as:

  • Famous people associated with a movie e.g. movies on Abraham Lincon (Famous people dataset from DBpedia).
  • Cities/countries in the movie e.g. movies about California.
  • Sports associated with the movie e.g. basketball movies.

Members
  • Neha Sachdev
  • Vasini Chandrasekaran
  • Karthik Anantha Prakash
  • Sparshith Nairbalige Rai

Group 2

Topic: Understanding and assess the factors that impacted the “Trump effect” . Probable questions we are trying to answer are:

  • Which religious/Ethnic group supports Trump the most?
  • City wise percentage of homeless people who voted for Trump.(Visualization)?
  • What percentage of users were Facebook user but not Whatsapp user but still supported Trump?
  • What percentage of people voted for 'Lower taxes' propaganda of Trump and were in the age group of 24-36?
  • How many people from LGBTQ community in the range of 2400-3200 salary supported him and what were their main voting issues?

Members
  • Anand, Deepika
  • Kochhar, Sakshi
  • Ramaswamy, Srividya
  • Thuravi Prakash, Roopa

Group 3

Topic: Social media could be used as a tool to retrieve the emerging situational awareness during disasters. Understanding the impact of a disaster would support the process of disaster response, making decisions, and may be helpful for further disaster prediction/studies. In this work, we propose to use information shared on Twitter (tweets), integrating with spatial data (location), and disaster information crawled from multiple sources (FEMA, USGS, Data.gov, etc.) to analyze, and understand the impact/effect of specific disasters. The study will include analysis, and answers to the below questions:

  • Retrieving the trending information about the disasters along time (temporal aspect)
  • The effect of the disasters (spatial aspect)
  • Summary of tweets relating to the disasters by displaying keywords
  • Sentiment analysis relating to the disasters (positive / negative feeling)

Members

  • Lu, Guannan
  • Nguyen, Minh, Ngoc Binh
  • Zhang, Zixuan
  • Chen, Dingyuan

Group 4

Topic: Anime Business Analysis Tool


    Goal: We will scrape multiple Anime websites (https://myanimelist.net/, http://www.anime-planet.com/) to create our website which performs business analysis. So if you are to start a new anime, you need to have full information as to who is your target audience, who you should pair with, who are your competitors etc. There is lot of data across two sites but not structured to answer these business questions. Probable questions we are trying to answer are:
  • Top 10 regions that watch Animes the most.
  • What are the most liked Anime for a given location
  • Most adored tagged character for a given age rating.
  • Highest scored animes for a particular broadcast time.
  • Top 5 deviation in user reviews and score for an anime from different sites.
  • Finding the top staff(producer, director, studio) based on score, so that you know who to pair with.

Members
  • Kaur, Gurleen
  • Lohith, Sachin, Keshavaiah
  • Parekh, Disha Yogesh
  • Junaid, Mohammed

Group 5

Topic: Perform analysis on a collection of datasets and create an RDF ontology for countries around the world
We will be collecting datasets about countries around the world and creating an RDF ontology to show insightful information from the data. We aim to answer questions such as:

  • What effect does GDP have on a country's education?
  • What effect does population have on a country’s GDP?
  • Where is terrorism most dominant?
  • Which country has the lowest per capita income?
  • PWhich country has low/high low levels of education?
  • Which country has lowest per capita income and what are their overall ranges of income?

Dataset:
  • JSoup for web scraping of data via various websites
  • Apache Jena to write the data in RDF graphs using Java.
  • Then we will be using SPARQL to query the data so that we can answer some or all of the questions above.
  • Apache Jena Fuseki as the SPARQL server
  • The final presentation will be implemented via Python Flask (A web development framework) to ease the process of displaying the relevant inter-linkages between our different cross-domain semantic concepts.

  • Our tentative list of topics for pooling into our data repository will primarily be from these different, heterogeneous areas:-
    World Population, Education, Income, Budget, Literacy, Crime, Drugs/Health, Trading, Military, Natural Resources, Infrastructure, Currency, Wildlife, Weather

    Members
    • Jason Christopher Tan
    • Georgios Lydakis
    • Chetan Yadav
    • Manas Mahanta

    Group 6

    Topic: Sentiment Analysis using Travel Ontology and Social Media Data
    Goal: To successfully design and create travel ontology and perform sentiment analysis using the dataset from Twitter, Instagram, and Expedia, hotelbookings.com, Yelp. This will help evaluate popular reviews for any airline, hotels, restaurants or travel destinations/ landmarks.
    Results: Sample results could look like below:

    1. Query on best landmarks leveraging the sentiments (Twitter feeds)
    2. Restaurants to be visited at location based on the sentiment of the users(travelers)
    3. Best/Cheap Airlines associated with the locations
    4. Best time to visit a destination based on people's feedback
    Members
    • Atharva kale
    • Ganmani sekar
    • Aman Mathur
    • Puneet Koul

    Group 7

    Topic: Build a comprehensive source of information about video games, that puts together in one place all the relevant pieces of data that a gamer needs, from the most popular sources about games (Such as GameSpot, IGN and MetaCritics). In addition of related material such as Books and Movies (From sources such as: IMDB and Goodreads) that may extend the experience of the gamer beyond gameplay. Example of Relevant Data About a Game:
    Game Title, Game Description, Game Publisher, Release Date, Platforms Available, GameSpot Review Score, IGN Review Score, Game Spot Review Detail, IGN Review Detail, Users Score in GameSpot, Users Score in IGN, Game Wikis/Guides GameSpot, Game Wikis/Guides IGN, etc. Possible sample questions that can be answered:

    • Other Games that might be Related to a specific Game.
    • More Games by a certain Publisher.
    • All Games that have related Movies (Or vice versa)
    • All Games that have related Books (Or vice versa)
    • Books that might be Related to a certain Game.
    • Books that might be Related to a certain Game.

    Members

    • Andrade, Juan, Francisco
    • Muvva, Upendra Sandeep
    • Naik, Chinmay
    • Yesmin Kumar, Mehul

    Group 8

    Topic: Recommend Youtube videos based off of personality types. We will be using various video preference datasets and a personality dataset to answer the following questions:

      Use Case 1:

      Given a user how likely are they to like a particular video

    • Identify personality trait of new users via Twitter and Facebook API
    • Use the MBTI dataset and our own classifications from step 1
    • Suggest movies to the user according to personality type (at this stage all videos are already mapped to a personality type)
    • Use Case 2:

      Given a video who is its target audience

    • Pull new videos from YouTube
    • Classify their personality type
    • Recommend those videos to people with the outputted personality type

    • Datasets:
    • https://www.kaggle.com/datasnaek/youtube
    • https://www.kaggle.com/datasnaek/mbti-type

    • APIs:
    • YouTube Data API

    Members

    • Bino Joseph
    • Nikita Gupta
    • Malvika Nagpal

    Group 9

    Topic: The key idea of our project is to integrate music data and their releases, ratings, artists, events, albums etc. from several heterogeneous sources such as Spotify, MusicBrainz, Sound Cloud, ITunes etc. This will help to provide a unified view of music industry. The resulting ontology can then be used to answer interesting questions like:

    • Which songs have good ratings across different web sites?
    • Which are the most popular songs/Albums for a particular artist?
    • Which songs/albums have won Grammy awards?
    • Are there any concerts/ events in the vicinity for a specific artist?
    • Which Genre has the highest number of subscribers/ listeners?
    Members
    • Borikar, Shubham, Sudhir
    • Jain, Eshaan
    • Kheberi, Sarabjit Singh, Harwansh Singh
    • Aggarwal, Madhur



    Wednesday Section


    Group 1

    Topic: Construct a system that provides complete specifications of a cellphone and answer the queries regarding it. The system will be able to answer queries such as:

    • Cell phone manufacturer
    • Technical specifications
    • User reviews and rating
    • Details about production plants
    • (May implement) data analysis on specific phone brands
    Members
    • Adarsh Rajanikanth
    • Malatesha Somasundar Anantha
    • Neelima Vangipuram
    • Sahil Wadhwa

    Group 2

    Topic: The idea is to create a federated ontology based tool on the films adapted from novels which can serve as a knowledge base for users. Various features are to be analyzed and existing ontology for each feature is to be used to build the framework.

    • The movie dataset will be extracted from Dbpedia, and book dataset will be extracted from Book Crossing Dataset from IIF.
    • The built model will answer the books related to given movie in the query and vice-versa. Example - Given Movie Name: Notebook will extract the book with Book Name - Notebook.
    • Books searched by author name resulting in sorted dataset by rating for the given author name.
    • Most popular books read by the given age group.
    Members
    • Nisha Kapoor
    • Yash Rajkumar Kedia
    • Shruti Priya
    • Nitisha Pandey

    Group 3

    Topic: Creating a correlation ontology :
    Datasets:

  • Football Players (NFL Player data from statcrunch.com)
  • States (US states stats from City-Data.com)
  • Colleges (Universities) (Colleges in US from data.gov)

  • Questions such as :
  • What percentage of players from a NFL team have been drafted from universities?
  • What is the college recruitment rate in NFL?
  • Which state has the highest rated universities? And so on….
  • Members
    • Rahul Chandhoke
    • Viswambharan Kasturi Rangarajan
    • Shruti Anand Kulkarni
    • Riya Bharat Punjab

    Group 4

    Topic: The main idea of our project is to combine 3 datasets related to food that is :

    Recipes, Food Images and Nutrition value of ingredients from Kaggel.

    The project captures vital information about the recipes selected such as :

  • Images of ingredients mentioned in the recipe
  • Nutrient value of the dish
  • Members
    • Aditya Parameshwara
    • Charita Venugopal Etta
    • Ganesh Madhav Raghupathy
    • Xiaoyang Zhang

    Group 5

    Topic: We are thinking to work with food data using datasets:

  • Restaurants on TripAdvisor
  • Open Food Facts
  • What's Cooking
  • All these three datasets are on Kaggle. Using these datasets, we are going to answer few queries such as: Find the healthiest restaurant Find the healthiest restaurant in a particular cuisine Find the restaurant with high hotness (spicy) level
    • Ahuja, Swapnil
    • Chandurkar, Rushi, Nitin
    • Mittal, Saksham
    • Porwal, Raj

    Group 6

    Topic: We want to create an ontology system that integrate data obtained from databases, e.g. museum websites, or from crawling websites, e.g. painting wikipedia, that contain information about arts, artists, museums. With this system, we can create an engine that allows users to query terms such as artists, time-periods, or country, in which the engine could return museums that have the artist’s paintings on showing or return related paintings with detailed description in a time period. Our purpose for creating this ontology is so that people who are interested in art will have a media in which they can search up information about certain art, where it is displayed, and other relevant information.

    Members
    • YuLong Pei
    • Jingjing Wang
    • Majid Ghasemi Gol
    • Dan Ma

    Group 7

    Topic: Working with the below mentioned datasets about movies to visualize and queries like:

    • Which movie has best rating among all and in their respective categories.
    • Which movies have won most awards
    • Which genre has the most rating
    • Which genre has won most awards

    Datasets used:

  • UCI Movie dataset
  • Movie Faceted Search Dataset
  • IMDB dataset from Kaggle
  • Members
    • Abhilash Natraj
    • Rajni Kumari
    • Shruthi Kalkunte Narayanaswamy

    Group 8

    Topic: The basic idea for the project is that we will collect data from famous software language package management systems (npm for nodejs, nuget for .net, gem for ruby etc), then build ontologies based knowledge base on package's metadata including the package name, package usage (problems they solved), package dependencies, author(s) to discover relationship between different packages even across different languages. With the information from online source code hosting service like GitHub, we can track life cycles of these packages to improve our knowledge base.

    Members
    • Dizheng Wang
    • Huiqing Dai
    • Sijie Chen
    • Tianlei Xu

    Group 9

    Topic: Understanding impact of global population, GDP, health, freedom, pollution on world migration:

    • Immigration impact with given each of the following - independently or together - GDP, per capita income, government trust, population and happiness index.
    • Given the population what is the trust indicator people of that country have in their government
    • Impact of population on pollution of countries
    Datasets used:
  • World 2015 migration data United Nations dataset / US visa data / World population http://www.un.org/en/development/desa/population/migration/data/empirical2/migrationflows.shtml# https://www.kaggle.com/jboysen/us-perm-visas/data
    https://www.kaggle.com/theworldbank/global-population-estimates
  • World happiness index, health,GDP, trust, freedom - Kaggle https://www.kaggle.com/unsdsn/world-happiness/data
  • World pollution data - World Bank https://data.worldbank.org/topic/environment
  • Members
    • Rasvitha Kandur
    • Manikanta Kotthapalli
    • Karthik Chindalur Sridhara
    • Karthik Ravindra Rao