Document Type
Article
Keywords
Blockchain, Distributed File System, NameNode, DataNode, Ethereum
Abstract
Managing large-scale data in distributed environments is essential for developing the distributed file system (DFS) concept, which ensures reliable, scalable, and fault-tolerant data storage across multiple nodes. In a DFS, DataNodes divides large datasets into blocks, assigning replicas to enhance data redundancy. The NameNode is a central control unit that manages metadata that governs data storage and retrieval. However, the NameNode presents a potential single point of failure, creating challenges in ensuring metadata, integrity, trustworthiness, and overall system reliability in distributed environments. This study proposes a new method to address these challenges by designing and implementing a new distributed file system architecture using blockchain as a repository to store the metadata of the NameNode. The proposed system achieves several significant improvements by integrating the blockchain into the DFS architecture. The tamper-proof and immutable nature of blockchain ensures metadata integrity. The metadata recorded on the blockchain can be accessible and recoverable at any time because multiple replicas are maintained over the network, and it becomes resistant to unauthorized modifications, enhancing trust and data reliability. The proposed architecture simulated the DFS via Python and integrated it with Ganache as an Ethereum platform via the Web3 library. The results show that the proposed system achieves the best time to upload files in the DFS compared with the traditional Hadoop distributed file system; the metadata stored in the blockchain enhance the overall system performance by improving metadata trustworthiness, management, and data integrity. The performance metrics for the proposed system are memory utilization and the file execution time for files ranging from 1 MB to 100,000 MB. The results show that even as the file size increase and the number of executions increases, the system retains efficient memory utilization, requiring less RAM for larger files. Although the system's ability to handle large datasets is demonstrated by its scalability in memory usage, adjustments are required to counteract the longer processing times linked to larger files. This paper examines the trade-offs and limitations of integrating blockchain with DFS and issues with scalability, latency, and storage costs. Despite these obstacles, the proposed approach shows that blockchain offers a workable option for safe and dependable metadata management in that DFS. This makes room for additional research to increase productivity and reduce resource use.
How to Cite This Article
Alameen, Huda A. and Rabee, Furkan
(2025)
"Blockchain-Based Metadata Management in Distributed File Systems,"
Mesopotamian Journal of CyberSecurity: Vol. 5:
Iss.
2, Article 3.
DOI: https://doi.org/10.58496/MJCS/2025/022
Available at:
https://map.researchcommons.org/mjcs/vol5/iss2/3