No experience
Employment Type:
Job Category:
Software Development
Health Care & Medicine
Big Data System Infrastructure and HPC Engineer Ld
(This job is no longer available)
UCLA Health | Los Angeles, CA
Grad Date

Not sure what types of jobs you are interested in?

Explore Jobs
Based on Your Education

Follow This Company

Job Description

Job Duties
The Big Data and HPC Engineer Lead will report to DGIT Director of Research Computing and work with Research Computing team in the development and improvement of our high performance and Big Data Intensive (e.g. Hadoop/Spark) compute infrastructure and parallel system environment. DGIT Research Computing team is currently developing cloud-based infrastructure for data ingestion pipelines and analytics platforms to facilitate research activities and precision health initiatives. This position will directly contribute toward the evolution of our infrastructure design to a next generation system that leverages cloud and Big Data technologies. This position is also responsible for technical systems management, administration, and support for the cloud-based and on premise high-performance computing (HPC) cluster environments. This includes all configuration, authentication, networking, storage, interconnect, and software usage & installation of the HPC Cluster(s). The position is highly technical and directly impacts the daily operational functions of the above environment(s) and is responsible in installing/configuring/patching/upgrading software, and tuning, optimizing, proactively monitoring, and securing services. This position will work directly with all levels of Academic constituents, including faculty, researchers, and students, supporting and promoting the use of the HPC Cluster(s).

Job Qualifications
Bachelor's degree in computer science, computer engineering, or a related field, or the equivalent combination of education and related experience. AWS Certified Solution Architect, Developer, SysOps Administrator Certifications required. Experience with AWS/Cloud computing design, provisioning, and tuning. Significant experience with Linux/Unix systems including installation, configuration, networking, backups, updates and patching, and system security. Experience with Linux cluster resource allocation, job scheduling, InfiniBand networks, MPI communications, and cluster monitoring. Installing, testing, configuring, and administering HPC clusters/servers and software. Experience with deploying cluster in AWS environment using AMI, CloudFormation templates. Significant experience with well knowledge of Big Data technologies such as Hadoop, Spark, Nifi, Storm, Spark, HDFS, NFS, Lustre, Presto, Hive, AWS Redshift, AWS Athena. Experience with big data warehouse and ETL design and implementation including technologies such as Presto, Hive, AWS Athena, AWS EMR and RedShift. Advanced knowledge of scripting and programming languages such as C/C++, Java, Perl, Python, Ruby, and bash/csh/ksh. Substantial experience in one or more of the advanced areas: local, parallel and distributed file systems, NAS platforms, or container orchestration framework, SQL/NoSQL database systems and IaaS technologies. Significant experience with software container technologies such as Docker, CoreOS and/or Singularity. Extensive knowledge of RedHat, CentOS, Ubuntu Linux and Windows. Significant experience supporting multiple independent but inter-related systems and software packages and demonstrated advanced ability to provide innovative solutions to broadly defined tasks and problems and to interact with system developers and vendors. Excellent customer service skills, working directly with customers to resolve and troubleshoot technical issues and requests. Advanced verbal and written communication skills necessary to effectively collaborate in a team environment and present and explain technical information and provide advice to management. Effective expert analytical, problem-solving, and decision-making skills to develop creative solutions to complex problems. Expert communication, facilitation, and collaboration skills necessary to present, explain, and advise senior management and/or external sponsors. Ability to learn and adopt new technology. Self-directed individual with strong desire to learn and contribute in a team of technical peers. Ability to apply troubleshooting techniques to resolve complex, cross functional issues. Experience in an academic or research community environment. Experience with any or all of the following technologies/products: Slurm, PBS Pro, Moab, Ganglia, Lustre, GPFS, Infiniband, MPICH, OpenMPI.

About UCLA Health

For more than half a century, UCLA Health  has provided the best in healthcare and the latest in medical technology to thepeople of Los Angeles and throughout the world. Today we are one of the most comprehensive and advanced healthcare systems in existence. We are comprised of Ronald Reagan UCLA Medical Center, UCLA Medical Center, Santa Monica, Resnick Neuropsychiatric Hospital at UCLA, Mattel Children’s Hospital UCLA, and the UCLA Medical Group, with offices throughout the region.

Ronald Reagan UCLA Medical Center is consistently ranked by U.S.News & World Report as one of the top five hospitals in the nation and the best medical center in the western United States. The doctors, scientists and caregivers of UCLA are leaders in their fields. Every day, they perform pioneering work across an astounding range of disciplines, from organ transplantation and cardiac surgery to neurosurgery and cancer treatment. Year after year, our people have achieved medical breakthroughs and earned the highest industry honors. And we’re just getting started.

The dedicated professionals of UCLA Health are committed to healing humankind, one patient at a time, by improving health, alleviating suffering, and delivering acts of kindness on a daily basis.