Data Extraction Engineer

Company: Razer Inc.

Location: Chengdu

Commitment: Full time

Posted on: 2025-05-25 23:25

Joining Razer will place you on a global mission to revolutionize the way the world games. Razer is a place to do great work, offering you the opportunity to make an impact globally while working across a global team located across 5 continents. Razer is also a great place to work, providing you the unique, gamer-centric #LifeAtRazer experience that will put you in an accelerated growth, both personally and professionally.Job Responsibilities/ 工作职责 :Responsibilities:Design, develop, and deploy web scraping solutions to collect specific datasets for AI training purposes.Build robust and scalable web crawlers to extract structured and unstructured data from various online sources.Ensure data accuracy, integrity, and compliance with relevant laws and regulations.Clean, preprocess, and organize scraped data for use in machine learning models.Monitor and optimize crawling performance to ensure efficiency and reliability.Collaborate with AI teams to define data requirements and ensure the relevance of collected data.Document crawling workflows, tools, and results for future reference.Requirements:Bachelor's or master’s degree in computer science, Software Engineering, or a related field.Strong experience with web scraping tools and frameworks (e.g., Scrapy, Selenium, BeautifulSoup).Proficiency in programming languages like Python, Java, or Node.js.Familiarity with HTTP protocols, HTML parsing, and JSON data formats.Knowledge of database systems (SQL, NoSQL) for data storage and management.Experience with cloud platforms (e.g., AWS, GCP) and containerization tools (e.g., Docker).Strong understanding of web crawling ethics, regulations, and best practices.Excellent analytical skills and attention to detail.Preferred Qualifications:Experience with large-scale data scraping and handling distributed crawlers.Familiarity with AI and machine learning concepts, especially data preprocessing for AI models.Knowledge of browser automation and tools for rendering dynamic content.Ability to handle multilingual data and diverse data formats.岗位职责：设计、开发并部署网页爬虫解决方案，收集特定数据用于AI模型训练。构建稳健且可扩展的爬虫，提取结构化与非结构化数据。确保数据的准确性、完整性，并符合相关法律法规。对爬取的数据进行清理、预处理和组织，以便应用于机器学习模型。监控并优化爬虫性能，确保其高效可靠运行。与AI团队合作，明确数据需求，确保采集数据的相关性和价值。记录爬虫工作流、工具和结果，以便未来参考和改进。岗位要求：计算机科学、软件工程或相关领域的学士或硕士学位。熟练掌握网页爬取工具与框架（如Scrapy、Selenium、BeautifulSoup）。熟悉Python、Java或Node.js等编程语言。熟悉HTTP协议、HTML解析和JSON数据格式。了解数据库系统（SQL、NoSQL）用于数据存储与管理。有云平台（如AWS、GCP）及容器化工具（如Docker）使用经验。深刻理解爬虫的伦理、法规及最佳实践。具备优秀的分析能力与细节关注度。优先条件：有大规模数据爬取及分布式爬虫经验者优先。熟悉AI与机器学习概念，尤其是AI模型的数据预处理者优先。了解浏览器自动化及动态内容渲染工具者优先。能处理多语言数据及多样化数据格式者优先。Pre-Requisites/ 任职要求 :Are you game?

View Original Job Posting