Workerman 源码分析：文件上传

前言

在 Nginx 中 HTTP 数据是一边接收一边进行解析的，如果解析过程中发现收到的数据有问题就会停止解析，并且停止接收数据。

而 Workerman 将解析协议这一步进行后置，当程序需要用到 HTTP 协议携带的信息时才会解析相应的数据，并把解析结果缓存起来，下次获取信息时就直接从缓存中读取即可，避免多次解析。

两种方式各有自己的优点，前者的优点就是可以及时的检查数据是否有问题，后者的优点是在接收数据的逻辑相对要简单一点。

解析上传文件的逻辑在 Request::parseUploadFiles 方法中，下面是它的调用栈。

// source: Protocols\Http\Request.php
post()/file()
    -> parsePost()
        -> parseUploadFiles()

解析 POST 数据

当调用 post 或 file 方法获取 POST 参数或上传的文件时，程序会检查是否已经解析过了，如果已经解析过了则直接返回对应的结果，否则就会调用 parsePost 方法解析数据。

protected function parsePost()
{
    // rawBody() 返回的就是请求体的内容
    $body_buffer = $this->rawBody();
    // 初始化用于保存解析数据的缓存
    $this->_data['post'] = $this->_data['files'] = array();
    if ($body_buffer === '') {
        return;
    }
    // 尝试从缓存中读取
    $cacheable = static::$_enableCache && !isset($body_buffer[1024]);
    if ($cacheable && isset(static::$_postCache[$body_buffer])) {
        $this->_data['post'] = static::$_postCache[$body_buffer];
        return;
    }
    // 读取请求头中的 content-type 字段，如果请求头没有被解析过，那么也会进行解析并缓存
    $content_type = $this->header('content-type', '');
    // 尝试解析 boundary 字段，如果获取到了说明是 multipart/form-data 类型的，可能会上传文件
    if (\preg_match('/boundary="?(\S+)"?/', $content_type, $match)) {
        // $match[1] 中就是我们上面说的边界值
        // 加上 -- 得到请求体中的边界值
        $http_post_boundary = '--' . $match[1];
        $this->parseUploadFiles($http_post_boundary);
        return;
    }
    // 如果从 content-type 中匹配到了 json，比如 application/json
    // 说明请求体的数据是 JSON 格式的
    if (\preg_match('/\bjson\b/i', $content_type)) {
        $this->_data['post'] = (array) json_decode($body_buffer, true);
    } else {
        // 否则就是 application/x-www-form-urlencoded
            \parse_str($body_buffer, $this->_data['post']);
    }
    // 如果开启了缓存，就把解析结果缓存起来
    if ($cacheable) {
        static::$_postCache[$body_buffer] = $this->_data['post'];
        if (\count(static::$_postCache) > 256) {
            unset(static::$_postCache[key(static::$_postCache)]);
        }
    }
}

解析请求体的内容

接下来就是 parseUploadFiles 方法的内容了。

首先处理请求体，删掉末尾的结束边界值，然后通过边界值得到数据块数组。

// 先获取请求体的内容
$http_body = $this->rawBody();
//删除末尾的结束边界值
$http_body = \substr($http_body, 0, \strlen($http_body) - (\strlen($http_post_boundary) + 4))

// 通过边界值 + \r\n 分割请求体，得到数据块数组
$boundary_data_array = \explode($http_post_boundary . "\r\n", $http_body);
if ($boundary_data_array[0] === '') {
    unset($boundary_data_array[0]);
};

为什么计算 substr 结束位置最后要 +4 呢？

因为 结束边界值 = 边界值 + -- + \r\n。

接下来用两个 foreach 和一个 switch case 来解析请求体的内容。

遍历所有的数据块

通过 \r\n\r\n 分割数据块，得到数据块的头部信息和数据块的值，然后去除数据块的值末尾的 \r\n。

foreach ($boundary_data_array as $boundary_data_buffer) {
    list($boundary_header_buffer, $boundary_value) = \explode("\r\n\r\n", $boundary_data_buffer, 2);
    // 去除 $boundary_value 末尾的 \r\n
    $boundary_value = \substr($boundary_value, 0, -2);
    $key++;
}

解析数据块的头部信息

数据块的头部信息可能存在多行，所以需要通过 \r\n 分割头部信息字符串得到头部信息的数组。

然后遍历该数组，在循环中通过 : 分割每行的头部信息，得到字段名 $header_key 和字段值 $header_value 。

foreach (\explode("\r\n", $boundary_header_buffer) as $item) {
    list($header_key, $header_value) = \explode(": ", $item);
    $header_key = \strtolower($header_key);

    switch ($header_key) {
        case "content-disposition":
            // 匹配到了 filename 说明是文件数据
            if (\preg_match('/name="(.*?)"; filename="(.*?)"/i', $header_value, $match)) {
                $error = 0;
                $tmp_file = '';
                // 获取文件大小
                $size = \strlen($boundary_value);
                // 获取上传临时目录
                $tmp_upload_dir = HTTP::uploadTmpDir();
                if (!$tmp_upload_dir) {
                    $error = UPLOAD_ERR_NO_TMP_DIR;
                } else {
                    // 使用 tempnam 函数在临时目录下创建一个唯一文件名的临时文件
                    $tmp_file = \tempnam($tmp_upload_dir, 'workerman.upload.');
                    // 文件创建成功后，将数据块的值写入到文件中
                    if ($tmp_file === false || false == \file_put_contents($tmp_file, $boundary_value)) {
                        $error = UPLOAD_ERR_CANT_WRITE;
                    }
                }
                // 格式化上传的文件信息
                $files[$key] = array(
                    'key' => $match[1], // 表单中的字段名
                    'name' => $match[2], // 文件名称
                    'tmp_name' => $tmp_file, // 临时文件的完整路径
                    'size' => $size, // 文件大小
                    'error' => $error // 错误
                );
                break;
            } else {
                // 未匹配到 filename 说明是 POST 字段，需要解析 $_POST.
                if (\preg_match('/name="(.*?)"$/', $header_value, $match)) {
                    $this->_data['post'][$match[1]] = $boundary_value;
                }
            }
            break;
        case "content-type":
            // 添加文件类型
            $files[$key]['type'] = \trim($header_value);
            break;
    }
}

switch 中的逻辑就是判断 $header_key 的值，然后执行相应的操作。

content-disposition: 通过正则判断是否为文件数据，如果是文件就将数据块的值写入临时文件中，否则将字段和值保存到存放 POST 数据的数组中。
content-type: 记录文件的类型。