Open Source Software Data Analytics Lab

Software Engineering Institute, Peking University


We are a research group at School of Computer Science, Peking University.

In the broadest sense, our research belongs to the Software Engineering field, which focuses on improving the efficiency and quality of software development. The software engineering community is diversed in all sorts of ways toward this goal. We often take the empirical approach, by observing things in real life and summarizing practices from experiences, to build theories and mechanisms. We often invent intelligent techniques and bots to help control complex system and its development.

More specifically, our current focus is on observing software repositories and measuring how developers live their lives for various purposes, like helping understand and control large complex software systems, society, and universe. We might use a wide spectrum of technologies and interdisplinary methodologies, depending on the specific problem we are tackling with.

See the CCF list for top venues in this field. See Publications for our latest publications. If you want to learn more, see Resources in this website.

Current Research Ts

  1. Open Source Software Supply Chain (modeling, risk analysis and resolutin)
  2. Characterizing open source ecosystem as complex system
  3. Open source license compatibility detection and conflict resolution
  4. Open Source Sustainbility (deprecation prediction)
  5. Profiling Developer (expertise, personality and learning trajectory)
  6. Software engineering bots (library migration recommender/GFI recommender/dependency update bot/release note bot…)

For Prespective Students

We are constantly looking for self-motivated students with sufficient programming skills. Students with strong interest in mining big data, observing open source ecosystems and improving current software development practices are extremely welcomed. Industry experiences and rich software development skills will be your great advantage. Background in software engineering, statistics, visualization, data mining, machine learning and natural language processing might help you prosper in this field but are not necessarily required.

Contact Professor Zhou Minghui for details about PhD openings and undergraduate internship opportunities.

Industry Collaborators


Address: Room 1537, Science Building No. 1, Peking University, Beijing, China

Latest Posts

One paper accepted by ESEC/FSE!

Kai’s study on automatically retrieving and validating source code repository information for PyPI packages is accepted by ESEC/FSE 2024. Congratulations to Kai!

One paper accepted by TOSEM!

Kai’s study on characterizing deep learning package supply chains is accepted by TOSEM. Congratulations to Kai!

Writing Release Notes for Your Software: How to Get it Right

Release note is important. However, there is a lack of tutorials or widely acknowledged standards about how to produce a release note. Without “the right way,” release notes may cause all kinds of issues. In this article, we will provide an FAQ-style introduction on how to produce the “right” release note for your users, based on our recent research on ~1000 real release note issues in GitHub project. This is still a preliminary draft, so if you have any suggestions or critiques, feel free to comment below!