Yizhou Yan H +(86) 138 4080 9132 B yizhouyan9132@outlook.com Personal Info Name: Yizhou Yan Address: Room 514, Dormitory Building 6, School of Software, Dalian University of Technology (DUT), Road No.8, Development Zone, Dalian 116621, China Date of Birth: Mobile: Homepage: Email: March 2, 1991 +(86) 138 4080 9132 yizhouyan.net yizhouyan9132@outlook.com/yizhouyan9132@gmail.com Education 2013.09-present Master in Software Engineering, School of Software, Dalian University of Technology, Dalian, China. { Rank: 2/51; { GPA: 3.75/4.00, 87.46/100; { Supervisor: Dr. Yu LIU (Professor/Assistant Dean) 2009.09-2013.06 Bachelor in Software Engineering, School of Software, Dalian University of Technology, Dalian, China. { Rank: 15/289; { GPA: 3.82/4.00, 89.30/100; Major Course GPA: 3.95/4.00, 92/100; { Thesis: Gene Enrichment Analysis Based on Non-negative Matrix Factorization (Supervised by Dr. Yu LIU) Research Interests My research mainly focuses on data analysis and mining. I have been working on the management and processing of biomolecular data collected on a genome-wide scale (Computational Biology and Bioinformatics). I’m familiar with various NMF (Nonnegative Matrix Factorization) algorithms and have successfully applied to many experiments. I have also collected large sets of scholarly/scientific data including datasets for calculating h-sequence. Currently I’m engaging in community detection for large-scale data with applications to microarray data and scholarly data. Publications 1. Zhewen SHI, Yu LIU, Yizhou YAN, Xiaowei ZHAO. A Hierarchical Community Detection Method in Complex Networks. Journal of Computational Information Systems, vol.9, no.24, pp. 9715-9724, 2013. 2. Yu LIU, Zhen HUANG, Jing FANG, Yizhou YAN. An Article Level Metric in the Context of Research Community. WWW’14 Companion, Seoul, Korea, April 7-11, 2014. 3. Yu LIU,Yizhou YAN, Zhewen SHI, Aedin C Culhane (in preparation for submission). GeneSigCatcher: Automated retrieval of most relevant PubMed Central articles for GeneSigDB. 4. Yu LIU, Yizhou YAN (in preparation for submission). ScholarSeq: A Benchmark dataset for calculating a sequence of impact measures at individual level. Awards and Certifications 2014.10 2013.09 2013.09 2009.09-2012.06 2012.02 2011.06 Excellent Postgraduate Award(Top 5%) First Class Scholarship for Postgraduates (Top 15%) The third prize in NPMCM (National Postgraduate MCM) Learning Merit Scholarship (Top 15%, twice; Top 5%, once), Individual Scholarship (Top 10%, once) Honorable Mention in ICM, USA The third prize in CUMCM (China Undergraduate MCM) Technical Skills C, C++, C#, Java, Matlab, R MySQL, SQL Server, Oracle Latex, Microsoft Office, EndNote 1/3 Research Experiences 2014.07-present Community detection among large networks. { Description: This project is about devising algorithms of community detection in large networks. We will propose a new method applicable to large-scale networks derived from genome data and scholarly data, respectively. { My responsibilities: To learn various algorithms of community detection; to design the method; and to apply the method to genome data. { Skill Acquired: Methods of Community Detection 2014.03-present A benchmark dataset for calculating a sequence of academic impact measures at individual level. { Description: We have constructed a benchmark dataset that can be used for various dynamic academic impact assessments concerning time sequence (e.g. h-sequence). A corresponding system is under development, which will provide management of sequence data for scholars majoring in Computer Science. This system will be publicly accessible as a website very soon. A paper reporting this work is in preparation for submission. { My responsibilities: To crawl data from several websites and provide a dataset for calculating time-based impact measures such as h-sequence; To implement major h-sequence methods on the dataset; To provide background data for the online system. { Skill Acquired: Data crawling from websites; Data processing; h-index/h-sequence calculation 2013.09-2014.03 Name Disambiguation. { Description: We proposed a muti-level clustering algorithm combing the coauthor network and latent relations among venues to solve the name disambiguation problem. { My responsibilities: To exploit NMF to identify the relationships between venues. { Skill Acquired: NMF on large-scale datasets; Hadoop; Name Disambiguation 2013.03-2013.09 Gene Set Enrichment Analysis. { Description: By collaborating with Dr. Aedin C Culhane at Harvard School of Public Health, we incorporate the notion of degree of membership in fuzzy math into traditional NMF-based bi-clustering method and proposed a novel process for classifying genes and phenotypes, finding associations between them at the same time. Sparseness also be calculated to avoid noise. A paper on this work is in preparation for submission. { My responsibilities: To accomplish the whole project under the supervision of Dr. Aedin C Culhane and Prof. Yu Liu. { Skill Acquired: Details of GSEA; NMF algorithms 2012.08-2013.03 Automated retrieve relevant articles for GeneSigDB. { Description: In cooperate with Dr. Aedin C Culhane at Harvard School of Public Health, we utilize data mining methods to describe a new strategy to identify the subset of publications most relevant to GeneSigDB. This approach is expected to improve the efficiency of manual biocuration pipeline for GeneSigDB. The process contains the optimization of PMC search keywords using Latent Semantic Analysis and Vector Space Model, the extraction of tables from PDF files, and the classification of results. Biocurators found the pipeline useful and manually confirmed 90% of predicated gene-list-positive articles contained gene signatures. A paper reporting this work is in preparation for submission. The results are accessible at http://www.linkscholar.net/genelistfinder/. { My responsibilities: To accomplish the whole project under the supervision of Dr. Aedin C Culhane and Prof. Yu Liu. { Skill Acquired: Bioinformatics; Methods of data mining; Matlab and R; Paper reading and writing 2010.09-2011.08 Design of EDUGUI for Embedded Systems. { Description: EDUGUI is a lightweight GUI framework that not only provides a complete desktop environment for the users, but also furnishes a set of convenient and rich APIs to the developers. It works well on many platforms, including x86 Linux, x86_64 Linux and ARM Linux, in a fast and resource efficient manner. { My responsibilities: To fix bugs in this system; To improve the user interfaces; To implement several useful example applications. { Skill Acquired: Handling large projects; Linux; SVN Language English: { TOEFL (2014/08): 107 (Listening: 29; Writing: 29; Reading: 26; Speaking: 23) { GRE (2013/09): 314+3.5 (Verbal: 150; Quantitive: 164; Writing: 3.5) (Will update on 11/16/2014) Japanese: N1 of JLPT: Passed (TOP TEST of Japanese) Chinese: Native Activities 2014.03-2014.06 2013.09-2014.01 2013.09-2014.01 2011.09-2012.06 2010.09-2011.06 Teaching Assistant for Computer Networking (by Dr. Feng XIA ) Teaching Assistant for Introduction to Algorithms (by Dr. Lei WANG) Teaching Assistant for Database System (by Dr. Yu LIU) Leader of Embedded Group in Center of Innovation and Practice Vice Leader of Embedded Group in Center of Innovation and Practice 2/3