基于关键词抽取的云环境密文检索研究

项目来源

国家自然科学基金(NSFC)

项目主持人

杨震

项目受资助机构

北京工业大学

立项年度

2016

立项时间

未公开

项目编号

61671030

研究期限

未知 / 未知

项目级别

国家级

受资助金额

58.00万元

学科

信息科学-电子学与信息系统-信息系统与系统安全

学科代码

F-F01-F0102

基金类别

面上项目

关键词

可搜索加密机制 ; 关键词标引 ; 检索风险分析 ; 查询扩展 ; 云环境检索 ; Cloud Information Retrieval ; Searchable Encryption Scheme ; Keyword Extraction ; Query Expansion ; Retrieval Risk Analysis

参与者

才智;庄俊玺;王坚;曹怀虎;姚应哲;李超阳;陈伟桐;李怡德

参与机构

中央财经大学

项目标书摘要:云计算深刻改变了现代信息服务的计算、存储和管理模式,当前愈来愈多的信息选择存储到远端云服务器中。但用户和云服务提供商之间缺乏互信,用户数据不得不先加密后存储到云中。正是因为文档在云端以加密形式存储,使得传统检索模型都因无法理解文档而失效,云信息检索面临极大的挑战。为了解决这一难题,首先,课题研究基于贝叶斯风险模型的云信息检索风险建模。通过将云检索视为特殊的信息检索问题,在贝叶斯风险模型框架下实现云检索的最小风险建模。在此基础上,研究适用于云计算场景的云文档关键词分析、抽取与索引建立机制。基于词语空间分布和统计特性结合的方法实现极高精度的云文档关键词抽取。再次,研究支持极端短文本检索的可检索加密协议设计。在保证用户隐私和信息安全的基础上,实现支持极端短文本检索的可检索加密协议设计,提高云信息检索性能。最后,课题将构建一个金融云信息检索原型验证系统,并建立可为本类研究提供样本的信息检索语料库。

Application Abstract: With the rapid growth of internet usage and decentralized computing,storage and management characteristics of modern information services have start a new trend,with more and more sensitive information being transferred to the cloud.Unfortunately,the mutual distrust between the data owner and the cloud service provider,data usually have to be encrypted prior to out-sourcing for data privacy and to protect data from unsolicited accesses,which presents the enormous challenge of using data effectively to retrieve documents.Since the encrypted document in cloud is incomprehensible,which creates great challenges in cloud retrieval model definition,keyword index building,and searchable encryption scheme design.To remedy these challenges,in this work,after a review of current research literature,we first build a cloud information retrieval framework and formalize its retrieval risk formally.Secondly,since the existing searchable encryption schemes suffered from the inappropriate keywords selection,a new keyword detection measure based on the spatial distribution of a particular word is proposed.Thirdly,we modify the current searchable encryption scheme to support the state-of-art information retrieval methods,such as vector space model,probabilistic modeling,and language modeling,while the current solution only support simple equality queries on encrypted data that provide a slight better result than random selection.Besides,a financial cloud information retrieval system and the corresponding corpus will be built based on the above theoretical research and deployed for practical uses.This project,having promising academic and practical values,will promote the modernization and scientific level of the modern information retrieval technologies.

项目受资助省

北京市

项目结题报告(全文)

云计算深刻改变了现代信息服务的计算、存储和管理模式,当前愈来愈多的信息选择存储到远端云服务器中。但用户和云服务提供商之间缺乏互信,用户数据不得不先加密后存储到云中。正是因为文档在云端以加密形式存储,使得传统检索模型都因无法理解文档而失效,云信息检索面临极大的挑战。经过四年的努力,课题组严格按照项目申请计划,顺利完成项目预定研究目标,获得以下成果:1.课题研究基于贝叶斯风险模型的云信息检索风险建模。通过将云检索视为特殊的信息检索问题,在贝叶斯风险模型框架下实现云检索的最小风险建模。2.在此基础上,研究适用于云计算场景的云文档关键词分析、抽取与索引建立机制。基于词语空间分布和统计特性结合的方法实现极高精度的云文档关键词抽取。3.再次,研究支持极端短文本检索的可检索加密协议设计。在保证用户隐私和信息安全的基础上,实现支持极端短文本检索的可检索加密协议设计,提高云信息检索性能。此外,针对典型的云计算场景,即工业互联网场景开展标准化工作。项目负责人作为主编,提出了国际标准《信息技术安全技术工业互联网平台安全参考模型》,在2018年国际网络安全标准化工组ISO/IEC JTC1 SC27会议上成功立项为SP研究项目,并在2019年法国会议上成为新工作提案(NP24392),是我国在工业互联网领域立项的第一个国际标准。团队开发出多款信息内容检索系统,包括文本时间摘要系统、微博推荐系统、突发事件分析系统等,参加国际文本检索会议(TREC)获得佳绩,在TREC 2019大会的Incident Streams Track上获得A轮性能单项指标第一的优异成绩。目前项目已在IEEE Transactions on Vehicular Technology、IEEE Transactions on Neural Networks and Learning Systems、电子学报等刊物上发表论文11篇,其中SCI/EI收录10/10篇,被他人引用120余次;主编国际标准(草案)1项,国家标准1项目;申请国家发明专利17项,其中授权4项,登记软件著作权5项;主办IEEE ICIVC’20国际会议;部分研究成果获2017年吴文俊人工智能科学技术奖一等奖。已经培养教授/博导1名,副教授1名,博士后1名;入选长城学者1名;培养研究生19名(其中4名博士生、15名硕士生)。

  • 排序方式:
  • 6
  • /
  • 1.On the Comparisons of Decorrelation Approaches for Non-Gaussian Neutral Vector Variables

    • 关键词:
    • Gaussian distribution;Gaussian noise (electronic);Independent component analysis;Linear transformations;Principal component analysis;De correlations;Decorrelations;Independent components analysis;Neutral vector variable;Neutrality;Non-Gaussian;Non-Gaussian vectors;Non-linear transformations;Principal-component analysis;Property
    • Ma, Zhanyu;Lu, Xiaoou;Xie, Jiyang;Yang, Zhen;Xue, Jing-Hao;Tan, Zheng-Hua;Xiao, Bo;Guo, Jun
    • 《IEEE Transactions on Neural Networks and Learning Systems》
    • 2023年
    • 34卷
    • 4期
    • 期刊

    As a typical non-Gaussian vector variable, a neutral vector variable contains nonnegative elements only, and its l1 -norm equals one. In addition, its neutral properties make it significantly different from the commonly studied vector variables (e.g., the Gaussian vector variables). Due to the aforementioned properties, the conventionally applied linear transformation approaches [e.g., principal component analysis (PCA) and independent component analysis (ICA)] are not suitable for neutral vector variables, as PCA cannot transform a neutral vector variable, which is highly negatively correlated, into a set of mutually independent scalar variables and ICA cannot preserve the bounded property after transformation. In recent work, we proposed an efficient nonlinear transformation approach, i.e., the parallel nonlinear transformation (PNT), for decorrelating neutral vector variables. In this article, we extensively compare PNT with PCA and ICA through both theoretical analysis and experimental evaluations. The results of our investigations demonstrate the superiority of PNT for decorrelating the neutral vector variables. © 2012 IEEE.

    ...
  • 2.A novel POI recommendation model based on joint spatiotemporal effects and four-way interaction

    • 关键词:
    • Information services;Behavioral research;User profile;Heterogeneous information;Joint spatio-temporal;Location-based social networks;Neural interactions;Personalized recommendation;Real-world datasets;Recommendation performance;Spatiotemporal effects
    • Liu, Yongheng;Yang, Zhen;Li, Tong;Wu, Di
    • 《Applied Intelligence》
    • 2022年
    • 52卷
    • 5期
    • 期刊

    Point of interest (POI) recommendation is a fundamental task in location-based social networks (LBSN). The increasing proliferation of LBSNs brings about considerable amounts of user-generated check-in data. Such data can significantly contribute to understanding user behaviors, based on which personalized recommendations can be efficiently derived. Spatial and temporal effects are crucial factors in the user’s decision-making for choosing a POI to visit. Most existing methods treat them as two independent features and cannot accurately capture users’ interests. We argue that spatial and temporal effects should be analyzed simultaneously in POI recommendations. To this end, we propose a S patioT emporal heterogeneous information Network (HIN)-based PO I RE commendation model (STORE) to model various heterogeneous context features, e.g., the joint spatiotemporal effects, types of POI, and social relations. Specifically, we defined the spatiotemporal effects entity (St) in HIN to model the joint spatiotemporal effects. Instead of modeling the traditional two-way interaction <user, item>, we further design a four-way neural interaction model <User, Meta-path, St, POI>. In this way, our model can effectively mine and extract useful information from the meta-path-based context and spatiotemporal effects, thereby improving recommendation performance. We conduct extensive experiments on two real-world datasets, and the results demonstrate that the STORE model outperforms the best baseline by about 12% in NDCG@5 and 11% in Rec@5.
    © 2021, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.

    ...
  • 3.基于关键字的海报自动合成系统

    • 关键词:
    • 图像检索
    • 关帅鹏;于海阳;杨震;周明;赖英旭
    • 《北京航空航天大学学报》
    • 2022年
    • 2期
    • 期刊

    智能化的普及对图像编辑提出了新需求,海报作为一种以图像形式传递信息的方式,在日常生活和工作管理中起着重要的作用。海报的制作需要多元素图像进行合成,目前缺少一种交互式的、一键式的图像合成系统,因此,结合当前流行的图像处理技术

    ...
  • 4.User electricity consumption behavior mode analysis based on energy decomposition

    • 关键词:
    • Smart power grids;Decomposition methods;Electric power transmission networks;K-means clustering;Matrix algebra;Electric power utilization;Behavior-based;Data technologies;Decomposition methods;Electrical appliances;Electricity-consumption;Energy decomposition;Homogeneous relationship;Mode analysis;Nonnegative matrix decompositions;Sparse coding
    • Lu, Ruirui;Yu, Haiyang;Yang, Zhen;Lai, Yingxu;Yang, Shisong;Zhou, Ming
    • 《Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics》
    • 2022年
    • 48卷
    • 2期
    • 期刊

    With the popularization of smart grids and the development of big data technology, more and more attention has been paid to the analysis of users' electricity consumption behavior through electricity consumption data. The existing energy decomposition methods cannot meet the high requirements for resolution and decomposition accuracy in practical applications, and the cluster analysis method is too rough and does not fully show the electricity consumption characteristics of each type of electrical appliances. In view of this, this paper proposes an analysis method of users' electricity consumption behavior based on energy decomposition. Based on the discriminative sparse coding algorithm model, firstly, to solve the problem that the regular term of L0 is not easy to solve and the effect of the sparse constraint of L1 regular term is not ideal, we propose to use the sparse constraint of L1/2 regular term to perform energy decomposition, and add the homogeneity between users as a regular term to the basic model to modify the performance of the model. Secondly, based on the results of energy decomposition, we use the electricity consumption characteristics of a user's single-type electrical appliances instead of the total electricity consumption characteristics to refine the analysis of user's electricity consumption behavior, and improve the traditional K-Mean clustering algorithm for experimental verification. The experimental results show that the energy decomposition method based on the sparse constraint of L1/2 regular term and the constraint of homogeneity can effectively improve the accuracy of energy decomposition compared with the traditional discriminative sparse coding method. At the same time, the result of cluster analysis of users' electricity consumption behavior based on energy decomposition is also significantly improved.
    © 2022, Editorial Board of JBUAA. All right reserved.

    ...
  • 5.基于能源分解的用户用电行为模式分析

    • 关键词:
    • 智能电网;能源分解;聚类分析;非负矩阵分解;同质性关系
    • 卢瑞瑞;于海阳;杨震;赖英旭;杨石松;周明
    • 《北京航空航天大学学报》
    • 2022年
    • 02期
    • 期刊

    随着智能电网的普及和大数据技术的发展,利用用电数据分析用户的用电行为越来越受到关注,现存的能源分解方法无法满足实际应用中对分辨率和分解准确率的高要求,以及聚类分析方法过于粗糙没有充分挖掘每类电器的用电特点。提出了基于能源分解的用户用电行为分析方法。在判别式稀疏编码算法模型的基础上,针对L0正则项不易求解、L1正则项稀疏约束效果不理想的问题,提出用L1/2正则项稀疏约束进行能源分解,并且把用户之间的同质性作为正则项加入基础模型来修正模型的性能。基于能源分解的结果,使用用户单类电器的用电特征代替总用电特征精细化分析用户的用电行为,并改进传统的K-Mean聚类算法进行实验验证。实验结果表明:所提出的基于L1/2正则项稀疏约束和同质性约束的能源分解方法相比于传统判别式稀疏编码算法,能够有效提升能源分解的准确率。同时,基于能源分解的用户用电行为聚类分析效果也有明显提升。

    ...
  • 6.Automatic poster synthesis system based on keywords

    • 关键词:
    • Image retrieval;Automatic segmentations;Automatic synthesis;Composition rule;Daily lives;Image editing;Images synthesis;Layout recommendation;Life management;Seamless integration;Two ways
    • Guan, Shuaipeng;Yu, Haiyang;Yang, Zhen;Zhou, Ming;Lai, Yingxu
    • 《Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics》
    • 2022年
    • 48卷
    • 2期
    • 期刊

    The popularization of intelligence puts forward new requirements for image editing. As a way of transmitting information in the form of images, posters play an important role in daily life and work management. But the production of posters requires multi-element image synthesis. However, there is a lack of an interactive and one click image synthesis system. Combined with the current popular image processing technology, a poster automatic synthesis system is designed and implemented. We propose keyword-based image retrieval scheme, constructs a dual filtering scheme based on text and content, and provides users with accurate and fast image retrieval methods. By counting the composition rules of a large number of carefully designed poster pictures and introducing the composition rules of aesthetic common sense, we propose a portrait layout recommendation scheme based on two-way rules, which assists users in portrait layout design under the combined effect of two-way rules. The experimental results prove that the scheme designed in this paper can run stably and efficiently, users can realize image synthesis through simple interactive operations, and the final image synthesis effect is real and effective.
    © 2022, Editorial Board of JBUAA. All right reserved.

    ...
  • 7.of translation:基于能源分解的用户用电行为模式分析

    • Lu, Ruirui ; Yu, Haiyang ; Yang, Zhen ; Lai, Yingxu ; Yang, Shisong ; Zhou, Ming
    • 《Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics》
    • 2022年
    • 48卷
    • 2期
    • 期刊

    With the popularization of smart grids and the development of big data technology, more and more attention has been paid to the analysis of users' electricity consumption behavior through electricity consumption data. The existing energy decomposition methods cannot meet the high requirements for resolution and decomposition accuracy in practical applications, and the cluster analysis method is too rough and does not fully show the electricity consumption characteristics of each type of electrical appliances. In view of this, this paper proposes an analysis method of users' electricity consumption behavior based on energy decomposition. Based on the discriminative sparse coding algorithm model, firstly, to solve the problem that the regular term of L0 is not easy to solve and the effect of the sparse constraint of L1 regular term is not ideal, we propose to use the sparse constraint of L1/2 regular term to perform energy decomposition, and add the homogeneity between users as a regular term to the basic model to modify the performance of the model. Secondly, based on the results of energy decomposition, we use the electricity consumption characteristics of a user's single-type electrical appliances instead of the total electricity consumption characteristics to refine the analysis of user's electricity consumption behavior, and improve the traditional K-Mean clustering algorithm for experimental verification. The experimental results show that the energy decomposition method based on the sparse constraint of L1/2 regular term and the constraint of homogeneity can effectively improve the accuracy of energy decomposition compared with the traditional discriminative sparse coding method. At the same time, the result of cluster analysis of users' electricity consumption behavior based on energy decomposition is also significantly improved. © 2022, Editorial Board of JBUAA. All right reserved.

    ...
  • 8.of translation:基于关键字的海报自动合成系统

    • Guan, Shuaipeng ; Yu, Haiyang ; Yang, Zhen ; Zhou, Ming ; Lai, Yingxu
    • 《Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics》
    • 2022年
    • 48卷
    • 2期
    • 期刊

    The popularization of intelligence puts forward new requirements for image editing. As a way of transmitting information in the form of images, posters play an important role in daily life and work management. But the production of posters requires multi-element image synthesis. However, there is a lack of an interactive and one click image synthesis system. Combined with the current popular image processing technology, a poster automatic synthesis system is designed and implemented. We propose keyword-based image retrieval scheme, constructs a dual filtering scheme based on text and content, and provides users with accurate and fast image retrieval methods. By counting the composition rules of a large number of carefully designed poster pictures and introducing the composition rules of aesthetic common sense, we propose a portrait layout recommendation scheme based on two-way rules, which assists users in portrait layout design under the combined effect of two-way rules. The experimental results prove that the scheme designed in this paper can run stably and efficiently, users can realize image synthesis through simple interactive operations, and the final image synthesis effect is real and effective. © 2022, Editorial Board of JBUAA. All right reserved.

    ...
  • 排序方式:
  • 6
  • /