Data Scientist

University of California - San Francisco

50.00

United States, California, San Francisco

555 Mission Bay Boulevard South (Show on map)

Jul 22, 2026

Job Function Summary:

Involves developing and utilizing computational tools and systems to analyze and interpret biological or other research data. Utilizes and develops algorithms, computational techniques, and standard statistical methodologies. Helps in the design of new experiments and leads the execution of building machine learning and statistical models. Implements end-user needs in database development, maintenance, searching, and integration. Maintains computational infrastructure and manages and tracks the flow of samples and information for large-scale studies. Provides bioinformatics and access to public and proprietary databases. Manages cloud and on-premises computational infrastructure and data.

Generic Scope

Professional who applies acquired job skills, policies, and procedures to complete substantive assignments / projects / tasks of moderate scope and complexity; exercises judgment within defined guidelines and practices to determine appropriate action.

Custom Scope

Our research efforts are at the intersection of cardiovascular disease and human genetics. Our clinical research efforts employ new techniques for deep phenotyping, such as deep learning. But these techniques rely on a solid foundation of classical bioinformatics. The Bioinformatics Programmer/Data Scientist will assist in managing, cleaning, and analyzing large scale medical data using a wide variety of analytic techniques, both in the cloud and with on-premises compute depending on data permissions. Experience with a cloud provider such as AWS, Microsoft Azure, or Google Cloud is a plus, and ability to learn how to manage cloud-based pipelines, and to perform cloud data management will be essential skills to develop and maintain. Maintaining bioinformatic databases by obtaining and restructuring data, including both UCSF proprietary data and public data, and writing tools to streamline discovery and replication analyses using these databases will be core responsibilities. An important task will be writing and maintaining analytic pipelines in languages such as R, python, Go, Rust, shell, SQL, WDL, and/or other appropriate languages, and using tools such as Docker. Experience with databases or the ability to learn will be requisite. Under the supervision of the PI, the Data Scientist will also be involved in data analysis, and will be comfortable with bioinformatic analyses including variant calling and annotation. There will be opportunities to employ cutting-edge methods and to develop new methods. The ability to learn and implement new techniques depending on the problem at hand will be an essential skill, thus requiring a strong foundation in computer programming. This position will also include administrative duties and will have the opportunity to participate in-and to lead-authorship teams.

% of time	Essential Function (Yes/No)	Key Responsibilities (To be completed by Supervisor)
30	YES	Designs, develops, debugs and utilizes computer programs necessary to extract, transform, and load data and prepare it for analysis. Assists in extracting, transforming, and loading data from clinical sources and research sources using a wide variety of analytic techniques. Develops data pipelines to standardize and automate repeatable data processing steps as appropriate. Build and run programs to extract relevant imaging, biosignals, and medical data from clinical systems, including UCSF data. Performs data quality control.
25	YES	Utilizes standard software tools to analyze, interpret or create moderately complex biological or research data. Uses software such as plink2 to manage, merge, split, and analyze sequencing and genetic imputation data Performs quality control at the sample-, variant-, and genotype-level for genetic sequencing and imputation data Conducts analyses with linear, logistic, or survival models where appropriate
15	YES	Assists with computational resource management Assists with management of research databases and shared computational resources Manages cloud virtual environments Manages containerization with tools such as Docker
15	YES	Assists with report preparation and / or analysis for internal constituents and scientific publication and dissemination. Describes methods, results and implications of the work Conducts background bibliographic research and summaries of the latter if appropriate for documents to be published externally Generates appropriate data visualizations Assists with general manuscript preparation and submission
15	YES	Maintains code and documentation, communicates proactively Writes internal-facing documentation for all analyses, coding, tooling, and pipelines, clearly describing in text and graphics what is done and why it is done this way. Writes appropriate code comments explaining unintuitive decisions, algorithms, and functions to allow other lab members to reason clearly about the code. Uses change-management software, including git for code management. Proactively communicates to the PI about barriers to progress and possible code or workflow improvements. Provides the PI and collaborators with recommendations and guidance for subsequent steps.
100%		(To update total %, enter the amount of time in whole numbers (without the % symbol - e.g., 15, 20) then highlight the total sum (e.g., 1%) at the bottom of the column and press F9. The total sum should add up to 100%.)

Required Qualifications

Bachelor's degree in biological science, computational / programming, or related area and / or equivalent experience / training.
12 months or more of demonstrated work experience using medical and/or health-related data, or similar, including developing pipelines for extracting, transforming, and loading data, and data analysis.
Working knowledge of bioinformatics methods and data structures.
Working knowledge of biostatistics and basic statistical testing.
Working knowledge of systems programming and databases.
Working knowledge of application and data security concepts.
Ability to effectively manage time and see assigned parts of projects through to completion on deadline.
Basic consultation and communication skills.
Demonstrated fluency and competency with statistical programming with the R programming language or the Python programming language.
Experience with or a demonstrated ability to learn and implement data management and computational pipelines for management of large-scale data.
At least 6 months of experience in direct data management and analysis using medical and/or health-related data using the above tools.
Ability to lead and maintain data pipelines for real-time data acquisition from clinical systems
Ability to multi-task and work well with limited supervision
Working project management skills.
Interpersonal skills in order to work with both technical and non-technical personnel at various levels in the organization.
Ability to communicate technical information in a clear and concise manner.
Self motivated, able to learn quickly, meet deadlines and demonstrate problem solving skills.

Preferred Qualifications

MS or greater in a related science or an equivalent combination of education and experience.
PhD in a field relevant to biomedical research (bioinformatics, biomedical engineering), or computer science (computer science, machine learning, artificial intelligence) or similar.

Required Qualifications

Bachelor's degree in biological science, computational / programming, or related area and / or equivalent experience / training.
12 months or more of demonstrated work experience using medical and/or health-related data, or similar, including developing pipelines for extracting, transforming, and loading data, and data analysis.
Working knowledge of bioinformatics methods and data structures.
Working knowledge of biostatistics and basic statistical testing.
Working knowledge of systems programming and databases.
Working knowledge of application and data security concepts.
Ability to effectively manage time and see assigned parts of projects through to completion on deadline.
Basic consultation and communication skills.
Demonstrated fluency and competency with statistical programming with the R programming language or the Python programming language.
Experience with or a demonstrated ability to learn and implement data management and computational pipelines for management of large-scale data.
At least 6 months of experience in direct data management and analysis using medical and/or health-related data using the above tools.
Ability to lead and maintain data pipelines for real-time data acquisition from clinical systems
Ability to multi-task and work well with limited supervision
Working project management skills.
Interpersonal skills in order to work with both technical and non-technical personnel at various levels in the organization.
Ability to communicate technical information in a clear and concise manner.
Self motivated, able to learn quickly, meet deadlines and demonstrate problem solving skills.

Preferred Qualifications

MS or greater in a related science or an equivalent combination of education and experience.
PhD in a field relevant to biomedical research (bioinformatics, biomedical engineering), or computer science (computer science, machine learning, artificial intelligence) or similar.