puppeteer 安装
puppeteer 在服务器安装其实还是挺多坑的,详细可以见:
https://github.com/puppeteer/puppeteer/blob/main/docs/troubleshooting.md#chrome-headless-doesnt-launch-on-unix
这里简单介绍下puppeteer在服务器安装:
npm i puppeteer
你以为这样就可以啦?其实大部分时候你运行将遇到崩溃,为啥?在安装目录下运行ldd检查依赖就知道:
$ cd ./node_modules/puppeteer/.local-chromium/linux-782078/chrome-linux/
$ ldd chrome | grep not
libatk-1.0.so.0 => not found
libatk-bridge-2.0.so.0 => not found
libXcomposite.so.1 => not found
libXcursor.so.1 => not found
libXdamage.so.1 => not found
libXfixes.so.3 => not found
libXi.so.6 => not found
libXtst.so.6 => not found
libcups.so.2 => not found
libgbm.so.1 => not found
libpangocairo-1.0.so.0 => not found
libpango-1.0.so.0 => not found
libcairo.so.2 => not found
libatspi.so.0 => not found
libXss.so.1 => not found
libgtk-3.so.0 => not found
libgdk-3.so.0 => not found
libgdk_pixbuf-2.0.so.0 => not found
一堆依赖库没装,毕竟大部分服务器环境不需要这些浏览器相关的动态库依赖。
Centos依赖包如下:
alsa-lib.x86_64
atk.x86_64
cups-libs.x86_64
gtk3.x86_64
ipa-gothic-fonts
libXcomposite.x86_64
libXcursor.x86_64
libXdamage.x86_64
libXext.x86_64
libXi.x86_64
libXrandr.x86_64
libXScrnSaver.x86_64
libXtst.x86_64
pango.x86_64
xorg-x11-fonts-100dpi
xorg-x11-fonts-75dpi
xorg-x11-fonts-cyrillic
xorg-x11-fonts-misc
xorg-x11-fonts-Type1
xorg-x11-utils
一个个安装好就可以了。有些服务器kernel不支持 sandbox 模式,可以设置关闭 sandbox 模式:
const browser = await puppeteer.launch({args: ['--no-sandbox', '--disable-setuid-sandbox']});
puppeteer使用案例
// 导入包
const puppeteer = require('puppeteer');
(async () => {
// 因为服务器内核不支持sandbox,所以只能启用--no-sandbox
const browser = await puppeteer.launch({args: ['--no-sandbox', '--disable-setuid-sandbox']});
const page = await browser.newPage();
let time1 = new Date().getTime();
await page.setJavaScriptEnabled(true);
// 由于只关心渲染后的dom树,所以对css,font,image等都做了屏蔽
await page.setRequestInterception(true);
page.on('request', (req) => {
if(req.resourceType() == 'stylesheet' || req.resourceType() == 'font' || req.resourceType() == 'image'){
req.abort();
}
else {
req.continue();
}
});
// waitUntil 主要包括四个值,'load','domcontentloaded','networkidle2','networkidle0'
// 分别表示在xx之后才确定为跳转完成
// load - 页面的load事件触发时
// domcontentloaded - 页面的 DOMContentLoaded 事件触发时
// networkidle2 - 只有2个网络连接时触发(至少500毫秒后)
// networkidle0 - 不再有网络连接时触发(至少500毫秒后)
await page.goto('https://developer.orbbec.com.cn/', { waitUntil: ['load','domcontentloaded','networkidle2'] });
console.log(await page.content());
let time2 = new Date().getTime();
console.log((time2-time1)/1000)
console.log("finish");
// 关闭浏览器
await browser.close();
})();
本文暂时没有评论,来添加一个吧(●'◡'●)