Knowledge comes from AI and my organization
eMule is an important tool for P2P downloading. It can search for and download resources through the eD2k network and KAD nodes.
In order to make better use of this software, I have summarized some knowledge.
eDonkey2000 network
The eDonkey2000 network (abbreviated as eD2k) is one of the pioneers of modern P2P file sharing and is also the core network compatible with eMule.
It was released by American programmer Jed McCaleb in 2000, about a year earlier than BitTorrent. In the early days of the Internet entering the era of broadband, the eDonkey2000 network completely changed the P2P landscape dominated by Napster (centralized) and Gnutella (pure flooding) at the time, thanks to technologies like resume support and multi-source downloading.
Core Architecture: Semi-Centralized Model
The eDonkey2000 network adopts a 'server-client' hybrid architecture, which is its most core design feature:
Server (eDonkey Server):
- Responsibilities: Acts only as an index server and does not store any file data.
- Functions: Maintains a massive file directory (file names, sizes, hash values) and user locations (IP, port). When a user searches for a keyword, the server returns "who has this file."
Client (eDonkey/eMule, etc.):
- Responsibilities: Actually stores and transfers file data.
- Functions: Clients upload and download file chunks directly between each other.
Network topology summary: Search for the server to find the peer for download. Although the server helps users locate files, it does not store any files itself, and file transfers still occur directly between users.
When a user shares a file in eMule, eMule performs the following actions to publish the index to the server:
- Generate Hash: Calculates the MD4 hash of the file (this is the core of the eD2k link).
- Split Chunk Hashes: Splits the file into chunks and calculates the hash of each chunk (used for resuming downloads).
- Upload List: The client tells the server, "I have file [hash X], file name is XXX, size is YY, chunk hashes are ZZ..."
- Store Index: The server only stores this single line "index record" (file hash -> the client’s IP/port/client ID).
Key Point: The file data is not uploaded to the server; the server is just a "phone directory." When a user searches for a keyword:
- Send request: The client sends the search term to the connected server.
- Local retrieval: The server searches its in-memory database for a list of file hashes that match the filenames/metadata.
- Return results: The server returns a list of users (sources) who have the file to the client.
Downloading files (P2P transfer): The file transfer does not go through the server at all.
- Get source list: The client obtains a list of other users' IPs/ports who have the file from the server.
- Direct connection request: The client attempts to connect directly to these users (relying on TCP ports like 4662).
- Segment exchange: Once connected, both sides exchange file block data with each other through the MFTP protocol.
Advantages of this model (compared to early P2P)
- Fast: Searches only need to query a central server, with millisecond-level response, without the need to propagate queries across the entire network like Kad.
- Rich metadata: The server can store full filenames and descriptions, supporting fuzzy and Boolean searches.
- Simple implementation: The client logic is simpler than a full DHT.
- Stability: With a small number of users, it is easier to establish connections than in a DHT network (no need for bootstrap nodes).
Disadvantages of this model
- Single point of failure: If the server goes down, all users relying on it immediately lose search capabilities.
- High operational costs: Running the server requires stable financial support and is difficult to sustain long-term.
- Legal risks: Because the server is centralized, copyright holders can easily go after the server provider. For example, in 2006, the Razorback server was shut down by Swiss police, causing the network to collapse.
- Lack of privacy: The server knows what you searched for and downloaded.
The eDonkey2000 network introduced several core P2P technologies that are still in use today:
File chunking and multi-source transfer
Files are divided into 9.28MB-sized chunks.
Users can download chunk 1 from A, chunk 5 from B, chunk 3 from C—downloading different parts of the same file simultaneously from multiple sources.
Resume support: Even if the software is closed, incomplete parts can continue downloading the next time it is launched.
MD4 hash verification
Each file generates a unique 128-bit MD4 hash as a fingerprint.
Benefit: Even if the file name is changed arbitrarily (e.g., "abc.avi" to "movie.avi"), as long as the hash matches, you know it is the same file. This greatly prevents file name deception and duplicate uploads.
eD2k Links
Format example: ed2k://|file|filename.avi|734003200|HASHVALUE|/
These links contain the file name, byte size, and hash value. Users can click to directly add them to the download queue.
Contribution: eD2k links were among the earliest widely used P2P resource identifiers on the Internet and are still circulating in some nostalgic communities.
eDonkey2000 network is a hybrid P2P network that "maintains a file directory through multiple centralized servers while allowing direct file transfers between users." It represents the peak of the server-client architecture in the P2P field and a high point of centralized indexing. Eventually, due to both legal and technological pressures, the P2P world fully shifted to completely decentralized models like Kad and Mainline DHT.
However, without eDonkey2000's popularization of concepts like "hash verification" and "multi-source resumable downloads," later BitTorrent might have taken even longer to develop.
Kademlia network
The Kademlia protocol (abbreviated as Kad) is a distributed hash table (DHT) protocol designed to efficiently locate nodes or store resources in a fully decentralized environment.
It was first proposed by Petar Maymounkov and David Mazières in 2002 and is now widely used in P2P networks (such as eMule, BitTorrent, Ethereum) and blockchain systems.
Kademlia creates a "serverless network where everyone is a server" through XOR distance and binary prefix routing.
| Feature | eD2k Network Mode | Kad Network Mode |
|---|---|---|
| Architecture | Centralized Index | Decentralized DHT |
| Storage Location | File hash -> IP list stored in server memory | File hash -> IP list stored on nodes closest to the hash |
| Search Speed | Very fast (milliseconds) | Slower (requires routing hops, a few seconds) |
| Maintenance Cost | High (requires renting servers) | Zero cost (shared by all users) |
| Anti-Censorship | Very poor (useless if server goes down) | Very strong (unless there’s a global power outage) |
Kad is an external technology introduced by eMule for self-preservation. It allows veteran users to continue finding and sharing resources with each other through the Kad network even if all eD2k servers are shut down. In 2006, eD2k servers were largely shut down, yet eMule users survived relying solely on Kad.
XOR Distance
"XOR distance" is a metric used to measure the logical closeness of two ed2k file hashes in a distributed hash table network. XOR distance is the mathematical foundation of the Kademlia protocol, which uses this distance algorithm to build routing tables, locate nodes, and find resources.
Every file in the ed2k network has a unique 128-bit identifier, known as the ed2k hash (also called the MD4 hash). Each client node in the DHT network is also identified by a 128-bit ID.
The “distance” between two IDs (whether file hashes or node IDs) is calculated by performing a bitwise XOR operation on them. The result of the XOR operation is itself a 128-bit number. The numerical value of this number represents their distance. The smaller the value, the “closer” the two IDs are to each other.
Assume a 4-bit ID is used for simplification (in reality, it is 128 bits):
Node A ID: 1010
Node B ID: 1100
Node C ID: 1000
Target file hash T: 1001
Calculate distance:
dist(A, T) = 1010 XOR 1001 = 0011 (Decimal 3)
dist(B, T) = 1100 XOR 1001 = 0101 (Decimal 5)
dist(C, T) = 1000 XOR 1001 = 0001 (Decimal 1)
Conclusion: Node C's ID (1000) is closest to the target T (1001) (distance of 1). Therefore, during the search, the system will prioritize querying nodes like Node C.
The Kademlia network is essentially a distributed key-value storage database. The network determines where information about a file should be stored based on the file's hash value itself.
Key: The ed2k hash value of the file.
Value: The list of IP addresses of users who have the file.
Rules: Regarding a certain 'key' (file hash) and its 'value' (source information), it must be stored by a number of nodes in the network whose IDs are closest to that 'key.' This 'closeness' is measured using the previously calculated XOR distance.
When a user publishes a file, the following steps occur:
Calculation: The client calculates the ed2k hash value of its file (let's call it H_file).
Finding responsible nodes: The client searches in the network for K nodes (for example, 10) whose IDs are closest to H_file. According to the definition of XOR distance, these nodes are the 'responsible' nodes selected by the algorithm.
Storing information: The client stores its IP address information (as a download source) on the hard drives of these K 'responsible' nodes. This process is called 'publishing.'
When another user wants to download this file:
Calculate: They know the hash value H_file of the file they want (obtained from the link).
Find the responsible nodes: They then search the network for nodes whose ID is closest to H_file.
Get information: They query these nodes, "Who has the file H_file?" Since the source information is stored with these people when the file is published, they can directly return the list of IP addresses where the file is stored.
Connect and download: The downloader gets the source list and connects directly to these sources via ed2k to download the file.
Why is this design so efficient and powerful?
Deterministic: For the same file hash, the set of "responsible" nodes calculated by any client in the world is identical. This ensures that published information can always be found.
Decentralized: File source information is stored in a distributed manner across the network (on those "responsible" nodes) rather than on a central server. There is no single point of failure.
Load balancing: Since node IDs and file hashes are randomly distributed, the responsibility for storage is evenly spread across all network nodes.
Fast lookup: Using XOR distance routing, the lookup process can be completed in O(log N) steps, even if there are millions of nodes in the network.
eDonkey2000 URI Scheme
ed2k link (full name: eDonkey2000 URI Scheme) is a file fingerprint identification protocol and also a resource locator for the eDonkey2000 network. It was an early 'magnet link' in the P2P world, relying solely on the characteristics of the file itself to locate resources, without depending on server locations.
Compatibility with the KAD network
In eMule, the Kad network is inherently designed to handle ed2k links, and you don’t need to make any settings or switches:
- eD2k servers: use MD4 hashes as file keys
- Kad network: Node ID = file hash = all 128-bit MD4 values
This is not a coincidence; eMule’s Kad implementation deliberately retains the same hash length as eD2k (standard Kademlia uses 160 bits).
When you enter an ed2k link, internally eMule:
User pastes ed2k://|file|...|HASH|/
↓
Extract MD4: HASH
↓ ↓
Query eD2k server Ask Kad Network
(Who has this file?) (Who is closest to HASH?)
↓ ↓
Combine source lists
The user has no awareness at all and doesn't know whether the file was retrieved from the server or mined from Kad.
Probability of Duplication
From the perspective of mathematics and the physical universe, it is impossible to guarantee absolute non-duplication with 100% certainty, but through carefully designed hash algorithms, the probability of duplication (called a "collision") is so low that it can be considered "impossible" in the real world.
eD2k uses the MD4 algorithm, as well as the now more commonly used SHA-1, SHA-256, etc., all of which are designed for this purpose. Below is a detailed explanation of why we are so confident.
The output space of hash functions is incredibly vast: The MD4 algorithm used by eD2k produces a 128-bit hash value. This means there are 2¹²⁸ possible outputs, a number far beyond human intuition: much larger than the total number of atoms in the known universe (estimated to be about 10⁸⁰).
Very low probability of random collisions: Assuming the hash function is perfect (randomly mapping any input to any one of the outputs), then you take two files and the probability that they have the same hash value is 1 / 2¹²⁸. This probability is negligible. You take the initiative to try to create a file with the same hash value as the Avatar movie, and the success rate is orders of magnitude lower than if you were struck by lightning while winning multiple Powerball jackpots in a row.
MD4 has been cracked (but not a hurry): From a cryptographic point of view, the MD4 algorithm has been cracked. Researchers have been able to use sophisticated techniques to theoretically construct two specific sets of data so that they have the same MD4 hash. But this ability to "construct collisions" is severely limited: it can only generate a few nonsense, randomly formatted data pairs. Researchers can't: "Given a specific document (like an important contract), create another file with different content but the same MD4 hash."
Probabilistic Guarantee: eD2k relies on probabilistic certainty. The probability that two meaningful files will produce the same MD4 hash due to random chance is so low that it is virtually impossible on the scale of the entire universe and all human computers.
Security Depends on Purpose: MD4 is extremely insecure for encrypted communication or digital signatures, but for the purposes of verifying file integrity and uniquely identifying files in a P2P network, it is still sufficiently secure. Its goal is to prevent accidental errors, not to defend against deliberate malicious attacks (which are almost nonexistent in the eD2k sharing scenario).
De Facto Standard: In more than twenty years of operation of the eD2k network, there has never been a case of two different but meaningful files naturally colliding in their MD4 hash. All files are perfectly distinguished by their hash values. This has proven its effectiveness in practice.
Resource Availability
Compared to traditional HTTP and FTP downloads, P2P links like ed2k are most powerful and fascinating because resource availability does not depend on the original uploader, but on the entire network.
Anyone who has fully downloaded the file. Once their download is complete, their eMule client will, by default, start sharing the file (because the download folder is set as the shared folder by default).
In this way, the file spreads in the network from one source (you) userA to multiple sources (all users who have completed the download). Even if you, the initial uploader, go offline, the network remains healthy because userB, userC, and others can still provide data to each other.
This is exactly the embodiment of the P2P sharing spirit: I help everyone, and everyone helps me. Every person is both a downloader and a sharer, collectively sustaining the vitality of the entire network.
This is why many very old and rare resources can still be found on the eD2k network. As long as there is a "seeder" (userC) who still keeps the file on their hard drive and runs eMule, the resource remains "alive".
Link "Revival"
Due to the way eD2k links are generated, if two files have exactly the same content (bit-for-bit identical), they will have exactly the same eD2k link. Therefore, in some cases, links can 'come back to life', for example:
2008: User A shares a rare documentary and generates an eD2k link to post on Forum X.
2010: A's hard drive fails and the file is lost, so the link 'dies.'
2018: User B finds the same documentary (with an identical hash) on an old CD, shares it via eMule, and generates exactly the same eD2k link, so the link 'comes back to life.'
2018: User C clicks on the old link that A posted on Forum X in 2008 and starts downloading at full speed.
This is the power of 'content addressing': the validity of an eD2k link does not depend on the original publisher, but on whether there are 'sources' in the entire P2P network that have the same content file.
Compared to eD2k links, Magnet/BT links can differ from the corresponding BT file when generating resource links due to differences in packaging. If different Magnet links are generated for the same file, they will not automatically revive each other. This is the biggest drawback of Magnet/BT compared to eD2k.