**Database XenonDB 3.0 of Super Symmetry Technologies has recently released minutely market sentiment index which is designed to study the fine structure of high-frequency market sentiment. The new release of XenonDB has evolved its ability to provide minutely data on the foundation of daily and hourly capacities, by developing a span-new NLP algorithm architecture. We have built a new type of Transformer architecture which is able to efficiently process multi-modal information of graphics and texts as well as time series data based on the attention mechanism. A pre-trained language model of large-scale parameters is being developed on the new architecture. The powerful capacity of the language model is steadily carrying our product towards its goal: establishing the most fundamental infrastructure of cognitive intelligence of the market.**
### **1.** **About XenonDB**
Super Symmetry Technologies has been focusing on establishing an algorithm and data infrastructure which can support industrial digitalization. XenonDB is a professional time series database specialized in economics and finance. By applying natural language processing and distributed computing technologies, we identify, analyze and structuralize large-scale chaotic data on the market, and then generate time series data that can capture evolutions of the market and enterprises in every dimension, to provide professionals in industry, business, economics, finance, governmental administration, education and scientific research with services such as high-quality structured data, fast information indexing and high-frequency data transfer based on API.
In the core of XenonDB algorithm lie the large-scale pre-trained language model and large-scale pre-trained multi-modal model. Time series data is a common data type in economics and finance, whereas a conventional NLP model cannot analyze time series data coupled with texts. We innovatively put forward a project of pre-trained language model integrating graphics and text processing and time series. Multi-modal pre-trained algorithm architecture has tremendously increased the precision of text understanding and the quality of text sentiment analysis.
### **2. About Our Product: Market Sentiment Index**
The financial markets are driven by enterprise fundamentals, the cash flow of market players and the fluctuations of the market sentiment. the fluctuations of sentiment is an intrinsic property of markets. The market sentiment is generated by market players, affected by enterprise fundamentals, and thus has an impact on investment behavior and cash flow, which further acts on the market sentiment itself, forming a complex system with reflexivity built in. Quantitative studies of intrinsic properties of this complex system is the first step towards understanding of markets. Amongst market players we can find enterprises, governments, investment institutions, individual investors. The market sentiment reflects the comprehensive expectations of individuals and institutions on market entities and relevant events. The information environment based on graphics and texts as well as time series – precisely composed of content posted by social media users, research reports published by research institutions, press articles and governmental announcements – drives the market sentiment to rise and fall . The large-scale pre-trained language model that we developed aims to comprehend linguistic information and compute numerical values that reflect the general market sentiment by determining the sentiments of the language. The market sentiment index can continuously monitor the sentiment change of the market on real time, which provides financial researchers with high-frequency quantitative environment for studies on the fine structure of market fluctuations.
![图片3.png](https://dev-media.amazoncloud.cn/1078758eecfd4d7aa2f996d8696b4dfb_%E5%9B%BE%E7%89%873.png "图片3.png")
**Our Users**
* Individual and institutional developers specialized in quantitative investment
* Individuals and institutions in need of data analysis on financial markets
* Enterprises that develop financial products and solutions based on securities research
* Researchers that work on market research in the department of economics and finance of colleges and universities
* Governmental departments involved in finance, economics, industry and commerce
* Large corporations and publicly listed companies
**Product Example**
![image.png](https://dev-media.amazoncloud.cn/f962048002c441209df1b6bb84e3300a_image.png "image.png")
### 3. Our products
![image.png](https://dev-media.amazoncloud.cn/0652f1d8c8954d2db96f8e4f8c2b0d9b_image.png "image.png")
**3.1. Guba**
**Minutely sentiment index**:
Beginning point: 2021-02-01
Total number: 6 260 000+
Data generation moment: in the 5th minute after the specified minute
![image.png](https://dev-media.amazoncloud.cn/811d64c82c354d66ba3c5b03762d6bd2_image.png "image.png")
Note: sentiment value 0 shows extremely negative opinion, 0.5 is neutral, 1 is extremely positive
**Hourly sentiment index**:
Beginning point: 2007-01-03
Total number: 39 400 000+
Data generation moment: in each 17th minute of the hour
Hourly coverage: 2 000 - 3 000
![image.png](https://dev-media.amazoncloud.cn/4c742e0bb6034340b3391103d4249e3b_image.png "image.png")
**Daily sentiment index**:
Beginning point: 2007-01-03
Total number: 10 million
Data generation moment: at 4am every day
Hourly coverage: 4 000+ A shares
![image.png](https://dev-media.amazoncloud.cn/78931c8780974a1b90864847e6ebb45b_image.png "image.png")
**Two types of sentiment index**:
1. Average sentiment index: arithmetic mean of sentiment values computed by the NLP model;
2. Weighted sentiment index: weighted average of sentiment values computed by the NLP model based on the numbers of visits and comments.
**3.2.Xueqiu**
**Minutely sentiment index**:
Beginning point: 2021-02-01
Total number: 1.3 million+
Data generation moment: in the 5th minute after the specified minute
![image.png](https://dev-media.amazoncloud.cn/ae3e50cd978940308fdc0e071fdf1f19_image.png "image.png")
Note: sentiment value 0 shows extremely negative opinion, 0.5 is neutral, 1 is extremely positive
**Hourly sentiment index**:
Beginning point: 2011-10-27
Total number: 75 million+
Data generation moment: in each 15th minute of the hour
![image.png](https://dev-media.amazoncloud.cn/2256c1b7f40b4c3db3a75fa78eb6e6a1_image.png "image.png")
**Daily sentiment index**:
Beginning point: 2011-10-27
Total number (by 2021-03-31): 6.01 million+
Data generation moment: at 4am every day
Hourly coverage: 3 500+ A shares
![image.png](https://dev-media.amazoncloud.cn/df4bd0575a3f4c01bb8dd9ecb4b7e33d_image.png "image.png")
**3.3.Sina weibo**
Daily sentiment index:
Beginning point: 2009-09-22
Total number (by 2021-03-31): 1.96 million+
Data generation moment: at 4am every day
Hourly coverage: 3 500+ A shares
### 4. Application cases
Customers from quant funds use the market sentiment index on quantitative multi-factor models. The results of back tests by applying single-factor strategies have shown that it’s possible to have achieved investment yearly returns of 15% - 20% during last 14 years. We can see good consistency and coherence in the data.
**Case 1 – Single-factor Sentiment Model**
![image.png](https://dev-media.amazoncloud.cn/c4816aa643c94defa35878a2ee3a78c2_image.png "image.png")
Case briefing:
**Stock picking**: stocks of the top 10 rankings bases on the sentiment index of previous day (Day t - 1)
**Trading**: Buy in on the first trading day at the opening price, and sell out on the Nth trading day at the closing price
**Case 2 – Back Testing in Groups**
Back-testing on the top 2000 stocks in terms of sentiment values in Guba in the year 2020:
![image.png](https://dev-media.amazoncloud.cn/b32c2abb68424b688fcdbb69d6f30744_image.png "image.png")
The figure above shows the curves of accumulated investment returns applying the strategy of portfolio swap each three trading days in back tests
**Stock picking:** stocks of ranked top 250, 250 to 500, 500 to 750, 750 to 1000 based on the sentiment index of previous day (Day t - 1)
**Trading:** Buy in on the first trading day at the opening price, and sell out on the Nth trading day at the closing price
**Case 3 – Single-factor Model**
Top 100 VS Top 2000
![image.png](https://dev-media.amazoncloud.cn/5b30fd3e937d4a2fb66cec850936e83f_image.png "image.png")
### 5. Technical Architecture of Sentiment Index
The fundamental infrastructure of big data and algorithm that we developed is a robust foundation to high-quality products.
![image.png](https://dev-media.amazoncloud.cn/0fc46740461a4a90941805ec8290c01c_image.png "image.png")
Operation and Maintenance: Real-time inspection system
We have built an efficient alert system for operation and maintenance, ensuring the stability and robustness of data software system.
* **Crawler inspection**:
1. Self-developed large-scale distributed crawler inspection system
2. Accurate inspection and alert process, which sends inspection reports based on multi-dimensional methods such as the daily counted number of data pieces of crawlers in each business, total volume of physical occupation of data, to the information system (such as mailbox, Ding Alarm) for people in the corresponding business.
![image.png](https://dev-media.amazoncloud.cn/f1f2cb94b0814e34b38eb1c1a2abedac_image.png "image.png")
* **Computational inspection**
1. The computation of each activity is based on liable automatic platforms like Azkaban
2. Storage with the support from Hadoop Ecosystem
3. Celerity and agility of activity response realized by the high performance and availability of the distributed Spark computation.
4. Adequate safeguard for disaster preparation to ensure data safety and steady computation