Skip to content

Chat Platform Adapters

WeClone currently supports chat history from the following platforms as data sources. Contributions to adapt more platforms are welcome.

PlatformTextImagesVoiceVideoAnimated EmojisLinks (Shared)QuotedForwardedLocationFiles
WeChat
Telegram⚠️Converted to Emoji

Exporting Chat History

Telegram

  • Please use Telegram Desktop to export your chat history.
  • Select "Photos" for the message type and "JSON" for the format.
  • You can export from multiple contacts (exporting group chat history is not recommended). Then, place the exported ChatExport_* folders into the ./dataset/telegram directory. This means putting the chat history folders from different people together under ./dataset/telegram.
  • Please change telegram_args.my_id in the configuration file to your own Telegram User ID. The from_id in result.json is your User ID.

Directory Structure

dataset/
└── telegram/
    ├── ChatExport_2025-01-01/
    │   ├── result.json
    │   └── photos/
    └── ChatExport_2025-01-02/
        ├── result.json
        └── photos/

WeChat

  • Use PyWxDump (does not support WeChat version 4.0) or other open-source tools to export WeChat chat history. Other tools require the format to be converted to the PyWxDump export format.
  • You can first migrate (back up) your mobile phone's chat history to your computer to get more data.
  • The export type is CSV. You can export from multiple contacts (it is not recommended to use group chats with many people).
  • The chat history with one contact should be stored in a separate folder, which can contain multiple CSV files.
  • Copy the multiple folders of chats with different people to the ./dataset/csv folder of the WeClone project.

Preparing Media Files (Optional)

Execute this in an environment where you can access your WeChat personal folder. If you don't have an environment, create one and install the basic dependencies (uv pip install -e .). Then run the following command, which will save the image data used for training to the ./dataset/wechat/dat directory.

bash
python weclone/data/chat_parsers/wechat_parser.py --wechat-data-dir "Path to your WeChat personal folder, e.g., C:\Users\user\Documents\WeChat Files\wxid_d68wiru2zseo22"

Afterward, use a WeChat image decryption tool or other tools to decrypt the media files. Save the decrypted files to the dataset/media/images directory.

dataset/
├── csv/ (Default PyWxDump export structure)
│   ├── person1-personal_chat-1234567890/
│   │   ├── person1_0_5000.csv
│   │   └── person1_5000_10000.csv
│   └── person2-personal_chat-2345678901/
│       ├── person2_0_5000.csv
│       └── person2_5000_10000.csv
└── media/ (Manually decrypted media files)
    └── images/
        ├── 01c177d8ad98969ba048455b54eef.jpg
        └── 13d6d8a81fa7d09238c81fe314e85.png

Note:

  • Each person's chat history is stored in a separate folder.
  • CSV files can be split by the number of messages: person_start_index_end_index.csv

The data sample format required by WeClone is as follows:

idMsgSvrIDtype_nameis_sendertalkerroom_namemsgsrcCreateTime
17437267147299592543Image0wxid_12345iru2zsmo22wxid_6789z5qlxzfj22ImageFileStorage\MsgAttach\dd0e62b6eb67d1953454354350301d6c\Image\2024-10\01c177d8ad90af8969b455b54eef.dat2024/10/4 11:42
2637529293739295664Image0wxid_12345iru2zsmo22wxid_6789z5qlxzfj22ImageFileStorage\MsgAttach\dd0e62b6eb67d1953454354350301d6c\Image\2024-10\d8a8936ca622823452e80a53a6.dat2024/10/4 11:42
34073926741244663531Text1wxid_12345iru2zsmo22wxid_6789z5qlxzfj22Little Maldives2024/10/4 11:43
4706358374822797422Text1wxid_12345iru2zsmo22wxid_6789z5qlxzfj22Worthy of the name2024/10/4 11:43
python
class ChatMessage:
    id: int  # Sequential ID
    MsgSvrID: str  # Original message platform ID
    type_name: str  # Message type, see cut_type_data and skip_type_data
    is_sender: int  # 0: Sent by the other person, 1: Sent by you
    talker: str  # Message sender ID
    msg: str  # Message content
    src: str  # Media file path, additional information field
    CreateTime: Timestamp  # Message sending time
    room_name: Optional[str] = None  # Message receiver or group chat ID
    is_forward: bool = False  # Whether the message is forwarded

Exporting Data from Other Chat Software

You can refer to the ChatMessage class for adaptation. After adapting, you are welcome to submit a PR to WeClone.