新闻| 文章| 资讯| 行情| 企业| wap手机版| article文章| 首页|会员中心|保存桌面|手机浏览
普通会员

邹平市凌萱科技有限公司

企业列表
新闻列表
  • 暂无新闻
推荐企业新闻
联系方式
  • 联系人:李先生
首页 > 新闻中心 > 解析对象存储oss中的DOC、DOCX、PDF、PPT、PPTX、XLS、XLSX、TXT类型文件获取文本数据插入es进行精准或模糊检索
新闻中心
解析对象存储oss中的DOC、DOCX、PDF、PPT、PPTX、XLS、XLSX、TXT类型文件获取文本数据插入es进行精准或模糊检索
发布时间:2024-11-07        浏览次数:0        返回列表

企业文件管理系统用户在使用过程中往往不清楚具体的文件名称无法精准的查找文件。因此需要根据已知的关键字精准或模糊的获取匹配文件列表及文件中的关键内容【包含关键字内容】以便获取需要查找的文件。

解析对象存储oss中的DOC、DOCX、PDF、PPT、PPTX、XLS、XLSX、TXT类型文件获取文本数据插入es进行精准或模糊检索

多线程异步解析文件【多线程异步解析功能来源于一次用户批量上传2万+份文件改造】、es数据插入删除、检索

第一步:连接oss

 
 

第二步:获取inputstream 

 

第三步:文件解析

 
 

 第四步:多线程异步解析

 
 
 

 

 

 注意事项:如果出现index存储大小在不停增长而文档计数很长时间不发生变化。可考虑显示的设置"refresh_interval": "1s"或写入数据的时候设置?refresh或?refresh=true。具体问题具体分析,本项目使用的es版本为7.5.0。

该样例涉及范围查询、多条件组合查询、浅分页、高亮显示。具体功能使用可参照官网获取

Elasticsearch Guide [8.7] | Elastic

{

  "query": {

    "bool": {

      "must": [

        {

          "range": {

            "uploadTime": {

              "gte": "2022-07-05 12:49:00",

              "lte": "2022-07-30 13:54:00"

            }

          }

        },

        {

          "term": {

            "fileExt.keyword": "DOCX"

          }

        },

        {

          "bool": {

            "should": [

              {

                "match_phrase": {

                  "fileName": {

                    "query": "操作",

                    "boost": 10

                  }

                }

              },

              {

                "match_phrase": {

                  "orgCode": {

                    "query": "操作",

                    "boost": 10

                  }

                }

              },

              {

                "match_phrase": {

                  "orgName": {

                    "query": "操作",

                    "boost": 10

                  }

                }

              },

              {

                "match_phrase": {

                  "resourceId": {

                    "query": "操作",

                    "boost": 10

                  }

                }

              },

              {

                "match_phrase": {

                  "resourceName": {

                    "query": "操作",

                    "boost": 10

                  }

                }

              },

              {

                "match_phrase": {

                  "resourcePathId": {

                    "query": "操作",

                    "boost": 10

                  }

                }

              },

              {

                "match_phrase": {

                  "resourcePathName": {

                    "query": "操作",

                    "boost": 10

                  }

                }

              },

              {

                "match_phrase": {

                  "content": {

                    "query": "解决",

                    "boost": 5

                  }

                }

              },

              {

                "match": {

                  "fileName": {

                    "query": "操作",

                    "boost": 5

                  }

                }

              },

              {

                "match": {

                  "orgCode": {

                    "query": "操作",

                    "boost": 5

                  }

                }

              },

              {

                "match": {

                  "orgName": {

                    "query": "操作",

                    "boost": 5

                  }

                }

              },

              {

                "match": {

                  "resourceId": {

                    "query": "操作",

                    "boost": 5

                  }

                }

              },

              {

                "match": {

                  "resourceName": {

                    "query": "操作",

                    "boost": 5

                  }

                }

              },

              {

                "match": {

                  "resourcePathId": {

                    "query": "操作",

                    "boost": 5

                  }

                }

              },

              {

                "match": {

                  "resourcePathName": {

                    "query": "操作",

                    "boost": 5

                  }

                }

              },

              {

                "match": {

                  "content": {

                    "query": "解决",

                    "boost": 1

                  }

                }

              }

            ]

          }

        }

      ]

    }

  },

  "size": 10,

  "from": 0,

  "highlight": {

    "fields": {

      "fileName": {},

      "orgCode": {},

      "orgName": {},

      "resourceId": {},

      "resourceName": {},

      "resourcePathId": {},

      "resourcePathName": {},

      "content": {}

    },

    "pre_tags": "<font color='red'>",

    "post_tags": "</font>",

    "number_of_fragments": 1,

    "fragment_size": 100,

    "no_match_size": 100

  }