Exploring Data Factorization in the AI Era

On April 29, the Ninth Digital China Construction Summit was held in Fuzhou, where the “National Data Factorization Series” was officially launched. After the release, Zhang Xianghong, the chief editor of the series, engaged in a dialogue with People’s Data.

People’s Data: The development momentum of the digital economy is surging, and the wave of artificial intelligence is overwhelming. Global economic growth and social development now rely more on the release of value from data, a new type of production factor, rather than traditional factors like land, labor, technology, and capital. Some say “data factorization” is a tough nut to crack; what are your thoughts?

Zhang Xianghong: In recent years, valuable explorations and achievements have emerged from the central to local levels, and from academia to industry. However, frankly speaking, these results are still relatively scattered and have not formed a systematic, complete, and operable framework. This is precisely where it is “hard.”

Data, as a new type of production factor, possesses unique characteristics: it is replicable, non-consumable, increases with use, and can be simultaneously utilized by multiple parties. Traditional theories of property rights, pricing, and transactions cannot be directly applied. Data exchanges have been established in various regions, and the concept of data assets entering the balance sheet has moved from theory to pilot projects. Cases of authorized operation of public data are also increasing. These practices have accumulated valuable experience for the industry. However, we also see that many data exchanges have limited trading activity, with most transactions still following the old path of “over-the-counter one-on-one” deals. There are still disputes regarding evaluation standards and audit paths for data assets entering the balance sheet, and the issues of public data being “unwilling to open, afraid to open, and unable to open” remain prominent. Data factorization is indeed a tough nut to crack. It is not just a problem of one link but a systemic and global challenge.

People’s Data: Data factorization is a pioneering endeavor. Some netizens have asked if it would be better to wait for others to achieve results before following suit?

Zhang Xianghong: My answer is: we cannot afford to wait. The wave of change has already reached our feet. Over three hundred years ago, technology and capital, as new types of production factors, transitioned human society from an agricultural to an industrial one. Today, data is playing the role that technology and capital once did—this is a brand new “factor revolution.” In the past two years, the surge of artificial intelligence has continuously refreshed our understanding of capabilities. Some ask me: With AI being so powerful, is data factorization still necessary?

My answer is: the stronger AI becomes, the more urgent and fundamental data factorization is.

Because the “intelligence” of AI does not arise from thin air. What makes large models smarter? It relies on computing power and algorithms, but ultimately, it relies on data. Data is the fuel of AI and its very soul. The ability of a model to answer questions, the accuracy of those answers, and their alignment with human values depend on the data it “consumes.” However, the reality is that while the volume of data in society is exploding, high-quality, available, and transferable data remains severely lacking. Many AI companies spend significant effort on “finding data, cleaning data, and adapting data.” Public data is reluctant to open up, corporate data is unwilling to share, and personal data is not authorized—issues of “insufficient supply, poor flow, and ineffective use” are magnified in the AI era. This is the core problem that data factorization aims to solve.

People’s Data: Our country has initially explored and formed a toolbox and methodology for data factorization that ensures “supply, flow, effective use, and security.” However, some foundational, structural, and systemic issues have yet to be clarified, and different regions and industries have inconsistent understandings of data factorization, uneven efforts, and unsatisfactory results. How do we tackle this?

Zhang Xianghong: Tackling tough nuts cannot rely on brute force; we need methods, tools, and a roadmap.

In March 2024, we released the “Six Horizontals and Two Verticals” framework for the overall structure of data factorization. This framework later became the backbone of the series. The “Six Horizontals” refer to six horizontal links: system, foundation, main body, capability, value, and circulation; the “Two Verticals” are the application and security dimensions that run through it. These eight aspects basically cover all components of data factorization.

Based on this, we planned nine volumes in total under the “1+8” structure. The first volume discusses the overall framework, while the subsequent eight volumes correspond to the institutional system, national data infrastructure, data industry, data resource development and utilization, public data value release, cross-border data flow, digital China construction, and data security. Data factorization is an exploration without ready-made answers. We firmly believe that China is forging its own path on this journey. This series of books is a footprint we leave on this road. We welcome everyone to join us in this journey of tackling tough nuts, to successfully navigate and enhance the path of data factorization.

Exploring Data Factorization in the AI Era

The article discusses the significance of data factorization amidst the rise of AI, highlighting challenges and frameworks for development.

Exploring Data Factorization in the AI Era

Comments