TDSM 4.5
part d: ranking universities
We can rank universities in different department based on the activity of their people in Quora[1].
The steps in crawling the data is as follow:
1. Find topics in different areas of interests.
2. For each topic crawl all the questions in that topics.
3. For each question crawl all the answers in that question.
4. For each answer we can record the information regarding to the responder. The information contains the number of upvotes the answer has achieved, the affiliation of the responder, the number of view that answer received by different people.
5. Data will need to be cleaned. People may have written their university name in different ways or they may have used abbreviation. The all should be integrated and normalized to avoid duplication.
6. By aggregating the records based on the university and topics, we can collect a rich data set containing more than 2500 universities and different scores for each topics and each university. For each pair (university,field) we have total number of views and upvotes. We can define a function based on number of views, number of upvotes and the number of active people in Quora for each university and then rank universities in each filed.
7. The ranking can be compared with different rankings such as QS ranking[2] for each field or [Google Scholar][3] ranking.
8. We crawled the information from google scholar and ranked the universities based on the citation they have in each filed.
9. These rankings can be compared. We found that in some filed like Electrical Engineering and Computer Science the rankings in google scholar and quora are correlated but for some filed like biology and physics they are not correlated.