动态爬虫与数据采集平台
Dynamic Web Crawler & Data Collection Platform
使用 Selenium + BeautifulSoup 搭建可复用动态页爬虫,支持无限滚动与复杂 DOM;设计标准化 ETL 清洗/解析并写入 MySQL/PostgreSQL;可配置 XPath/CSS/正则提取框架;错误处理与重试保证稳定性;Streamlit 配置任务、批量运行与导出。
Built a reusable crawler for dynamic pages using Selenium + BeautifulSoup, supporting infinite scroll and complex DOM structures. Designed a standardized ETL pipeline to clean/parse data and load into MySQL/PostgreSQL. Created a configurable extraction framework (XPath/CSS selector/regex). Implemented error handling and retry mechanisms. Developed a Streamlit UI for task configuration, batch runs, monitoring, and export.