原创内容第676篇,专注量化投资、个人成长与财富自由。

今天我们继续开发AI大模型自动读研报。

从研报到模型,大致分成几步:

[“propose_hypo_exp”, “propose”, “exp_gen”, “coding”, “running”, “feedback”]

首先,我们需要接入大模型API,我们使用openai的SDK来封装。

openai OpenAI
os


LLM:
    api_key=os.getenvbase_url=model=system_prompt=:
        .client = OpenAI=api_key=base_urlsystem_prompt:
            system_prompt =                                                         .system_prompt = system_prompt
        .model = model

    user_prompt:
        completion = .client.chat.completions.create=.model=: : .system_prompt: : user_prompt=completion.choices.message.content

这样一个基础功能就完成了。

我们可以向它提问:

core.llm LLM
resp = LLM.chatresp

输出结果如下:

准备好了16份研报:

比如研报一,

我们的目标是AI自动读研报,然后自主建模复现研报

研报基本都是pdf格式,因此我们需要一个pdf阅读器,langchain里实现了pdf解析,我们直接使用就好了。——langchain生态封装有点多,但工具也挺全的,我们可以使用它的周边。

pathlib Path
langchain_community.document_loaders PyPDFDirectoryLoaderPyPDFLoader

langchain_core.documents Document


docs: Document-> :content_dict = doc docs:
        Pathdoc.metadata.exists:
            doc_name = Pathdoc.metadata.resolve:
            doc_name = doc.metadatadoc_content = doc.page_content

        doc_name content_dict:
            content_dictdoc_name= doc_content
        :
            content_dictdoc_name+= doc_content

    content_dict


path:
    Pathpath.is_dir:
        loader = PyPDFDirectoryLoaderpath=:
        loader = PyPDFLoaderpathdocs = process_documents_by_langchainloader.loaddocs

把研报一读取进来:

loader.pdf_loader load_pdfs
docs = load_pdfsDATA_DIR.joinpath.joinpathdocs

读进来是文本:

接下来是重点,system prompt,要求LLM从研报中抽取出因子,以及对因子的模型。

: |-
    :
    1. ;
    2. ;
    3. ;

    user will treat your factor name as key to store the factor, don't put any interaction message in the content. Just response the output without any interaction and explanation.
    All names should be in English.
    Respond with your analysis in JSON format. The JSON schema should include:
    {
        "summary": "The summary of this report",
        "factors": {
            "Name of factor 1": "Description to factor 1",
            "Name of factor 2": "Description to factor 2"
        },
        "models": {
            "Name of model 1": "Description to model 1",
            "Name of model 2": "Description to model 2"
        }
    }

看下代码运行的结果:

系统按要示的字典格式,把因子和模型都抽取出来了。

{

    “summary”: “This report discusses the role of high-frequency factors in quantitative stock selection strategies, with a case study on the enhancement of the CSI 1000 index. The report categorizes high-frequency factors into reversal, momentum, and deep learning types, and tests their stock selection capabilities on a monthly and weekly basis. It also examines the impact of incorporating these factors into the CSI 1000 index enhancement strategy under various constraints.”,

    “factors”: {

        “Reversal_HighFrequency”: “Includes factors like Improved Reversal, EndOfDay_Trading_Volume_Ratio, HighFrequency_Skewness, Downside_Volume_Ratio, Average_Single_Outflow_Amount_Ratio, and LargeOrder_Push_Increase. These factors characterize investor overreaction and tend to select stocks with large previous drops or low turnover rates.”,

        “Momentum_HighFrequency”: “Includes factors like Opening_Buy_Intention_Ratio, Opening_Buy_Intention_Strength, Opening_LargeOrder_NetBuy_Ratio, and Opening_LargeOrder_NetBuy_Strength. These factors characterize investor buying intentions, capital flow in the order book, or trading behavior of informed investors.”,

        “DeepLearning_HighFrequency”: “Includes factors like Improved_DeepLearning_Factor and Residual_Attention_DeepLearning_Factor. These factors use past high-frequency features to dynamically fit recent trading patterns and are suitable for short-term windows.”

    },

    “models”: {

        “CSI_1000_Index_Enhancement”: “The model predicts stock returns using a linear weighting approach with base factors including market value, mid-cap (cube of market value), valuation, turnover, reversal, and volatility. High-frequency factors are incrementally introduced to examine changes in portfolio performance. The optimization objective maximizes expected returns under various constraints such as stock deviation, factor exposure, industry deviation, turnover frequency, and turnover rate limits.”

    }

}

代码结果如下,已经打包发布至星球:

作者:AI量化实验室(专注量化投资、个人成长与财富自由)