Fundamentals 7 min read

Indirect Shareholding Ratio Calculation Using Graph Techniques

This article explains how to compute indirect shareholding ratios between companies by generating synthetic relationship data, cleaning and normalizing it with multiprocessing, constructing a weighted directed graph using NetworkX, and applying a matrix‑based algorithm to derive the final ownership matrix.

Big Data Technology Architecture

Jul 2, 2022

Indirect Shareholding Ratio Calculation Using Graph Techniques

The introduction presents a corporate customer graph example and asks how to obtain the shareholding ratio of Company A over Company D, proposing the use of graph techniques to calculate indirect ownership.

Algorithm steps are illustrated with diagrams showing the workflow for computing indirect shareholding ratios.

Data description explains that demo data is generated with Python's faker library, producing relationship records and target customer data. Sample code for generating edge data and node data is provided:

#生成控股比例数据
#edge_num生成多少条demo关系记录
def demo_data_(edge_num):
    s = []
    for i in range(edge_num):
        #投资公司、被投资公司、投资比例、投资时间
        s.append([fake.company(), fake.company(), random.random(), fake.date(pattern="%Y-%m-%d", end_datetime=None)])
    demo_data = pd.DataFrame(s, columns=['start_company', 'end_company', 'weight', 'data_date'])
    print("-----demo_data describe-----")
    print(demo_data.info())
    print("-----demo_data head---------")
    print(demo_data.head())
    return demo_data

#节点数据
def node_data_(node_num):
    cust_list = [fake.company() for i in range(node_num)]
    node_data = pd.DataFrame(cust_list, columns=['cust_id']).drop_duplicates()
    print('节点数目', len(node_data['cust_id'].unique()))
    node_data.to_csv('node_data.csv', index = False)

Data processing, which uses Python's multiprocessing module, removes self‑investments, filters out empty records, deduplicates by date, discards multiple records with weight > 0.5 while keeping the latest, and normalizes weights that exceed 1. The processing code is:

#demeo数据处理        
def rela_data_(demo_data):
    print('原始数据记录数', len(demo_data))
    #去除自投资
    demo_data['bool'] = demo_data.apply(lambda x: if_same(x['start_company'], x['end_company']), axis=1)
    demo_data = demo_data.loc[demo_data['bool'] != 1]
    #去除非空
    demo_data = demo_data[(demo_data['start_company'] != '')&(demo_data['end_company'] != '')]
    #按照日期排序删除重复start_company、end_company项
    demo_data = demo_data.sort_values(by=['start_company', 'end_company', 'data_date'], ascending=False).drop_duplicates(keep='first', subset=['start_company', 'end_company']).reset_index()
    #删除多条大于0.5且保留最新值
    demo_data = pd.concat([demo_data.loc[demo_data['weight'] <= 0.5], demo_data.loc[demo_data['weight'] > 0.5].sort_values(by=['end_company', 'data_date'], ascending=False).drop_duplicates(keep='first', subset=['end_company', 'weight'])]).reset_index()[['start_company', 'end_company', 'weight', 'data_date']]
    global demo_data_init
    demo_data_init = demo_data.copy()
    #持股比例求和
    demo_data_sum = demo_data[['end_company', 'weight']].groupby(['end_company']).sum()
    #持股比例大于1的index
    more_one_index = demo_data_sum.loc[demo_data_sum['weight']>1].index.unique()
    print('持股比例大于1的index', len(more_one_index))
    #并行处理持股比例大于1的数据归一化（Linux可执行，Windows报错）
    items = more_one_index[:]
    p = multiprocessing.Pool(32)
    start = timeit.default_timer()
    b = p.map(do_something, items)
    p.close()
    p.join()
    end = timeit.default_timer()
    print('multi processing time:', str(end-start), 's')
    base_more_one = pd.read_csv('exchange.csv', header=None)
    base_more_one.columns = ['start_company', 'end_company', 'weight', 'data_date']
    #持股比例不大于1的index
    low_one_index = demo_data_sum.loc[demo_data_sum['weight']<=1].index
    base_low_one = pd.merge(demo_data, pd.DataFrame(low_one_index), on = ['end_company'], how = 'inner')
    demo_data_final = pd.concat([base_low_one, base_more_one]).reset_index()[['start_company', 'end_company', 'weight', 'data_date']].drop_duplicates()
    print('数据处理后记录数', len(demo_data_final))
    demo_data_final.to_csv('demo_data_final.csv', index = False)
    return demo_data_final

#并行处理函数
def do_something(i):
    #大于1的pd
    exchange = demo_data_init.loc[demo_data_init['end_company'] == i].sort_values(by=['end_company', 'data_date'], ascending=False)
    #fundedratio
    weight_sum = sum(exchange['weight'])
    exchange['weight'] = exchange['weight']/weight_sum
    exchange.to_csv('exchange.csv', encoding = 'utf-8', index = False, header = 0, mode = 'a')
    print('-----End of The', i, '-----')

Graph construction uses networkx to build a directed weighted graph from the cleaned relationship data. The relevant code is:

#构造有向图
def graph_(rela_data):
    Graph = nx.DiGraph()
    for indexs in rela_data.index:
        Graph.add_weighted_edges_from([tuple(rela_data.loc[indexs].values)])
    return Graph

global Graph
Graph = graph_(rela_data[['start_company', 'end_company', 'weight']].drop_duplicates())
print('图中节点数目', Graph.number_of_nodes())
print('图中关系数目', Graph.number_of_edges())

The model description introduces a matrix‑based method to obtain the indirect shareholding ratio matrix, with a decay parameter C and iterative multiplication:

#获取（间接）控股比例矩阵
def sum_involution(ma, n_step):
    #衰减参数
    C = 1
    mab = ma
    result = ma
    for _ in range(n_step-1):
        ma = round(ma.dot(mab), 6)
        np.fill_diagonal(ma.values,0,wrap=True)
        result = result + C*ma
    return result

An example of the model output is shown with a diagram illustrating the computed indirect ownership matrix.

Future work mentions discovering hidden relationships, applying community detection (e.g., Louvain) for group segmentation, and using supervised learning with known group labels to tune the decay parameter C.

The full source code is available at https://github.com/MO2T/1.Recognition_of_implicit_relationship .

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python data processing graph-analysis Multiprocessing networkx shareholding

Written by

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.