A palm tree

Data Collection Research Software Engineer at GitHub

Remote(USA Only)
2 months ago
This job posting is over 30 days old, but the application is still open.

Title: Data Collection Research / Software Engineer

Location: Remote – Global

Data Collection Research/Software EngineerGitHub is seeking a research/software engineer with programming languages and/or software engineering expertise to join the Copilot team as part of GitHub Next. GitHub Next aims to be a meeting place within GitHub for experimentation with new ideas, and for setting the agenda for GitHub’s product several years in advance. Next has a small number of permanent research staff, and this position is part of the core Next team.

This engineer will collaborate with OpenAI and Microsoft Research to collect and process large-scale data sets to improve the OpenAI Codex model that powers Copilot. The ideal candidate will have experience with mining software repositories, large-scale program analysis, and/or creating benchmark sets for machine learning for code tasks.


  • Collaborate with OpenAI to improve the quality of the Codex code synthesis model.
  • Undertake short- and medium-term research projects in the area of code synthesis, and ship improvements to the production model.
  • Participate in all activities of GitHub Next: organizing webinar series, evaluating project proposals, and disseminating research results.

Minimum Qualifications:

  • Ability to do innovative research on one of the following topics: mining software repositories, program analysis (static or dynamic), program synthesis, machine learning for code.
  • 3+ years experience building developer tools in production
  • Inclination to prototype quickly and make fast decisions on experiment failure.
  • A creative mindset and good practical skills are more important than formal experience.

Preferred Qualifications:

  • PhD in computer science or related field, or other evidence of the ability to do independent research.
  • Knowledge of Python or JavaScript and its ecosystem, or the ability to acquire such knowledge quickly.
  • Experience analyzing and/or mining large software repositories.
  • Ability to communicate complex ideas clearly, both in spoken and written form, for expert as well as novice audiences.
  • Interest in modern AI technologies and program synthesis in particular.
View ApplicationBuild a resume
Create a resume
Build a Modern Resume.
Ace Your Application.
Make a good first impression with a professionally designed resume. Import your LinkedIn profile and be done in minutes.
Make a resume
Building a modern resume from LinkedIn

Data Scientist Resume: How To Show Off Your Analytical Skills

You can write an effective data scientist resume with these valuable writing tips, resume sections to include, and formatting guidelines.
Data Scientist Resume Examples
A showcase of the best resumes built with Standard Resume. Use these diverse, real-world resume examples for inspiration and to help you write a great resume.