This project provides a tool to parse and extract data from CAR (Content Addressable Archive) files exported from repositories on the AT Protocol, including platforms like Bluesky. The tool decodes the repository data, processes it into JSON format, and organizes it by lexicon type.
A CAR (Content Addressable Archive) file is a snapshot of a repository’s state in the AT Protocol. It encapsulates all public data for a repository, such as posts, likes, and social graphs. CAR files are useful for:
- Portability: Sharing and offline analysis of repository data.
- Consistency: Capturing the exact state of a repository at a specific time.
- Backup and Archiving: Preserving repository data independently of live APIs.
Use Case | CAR File | DID/API Queries |
---|---|---|
Offline access | ✅ | ❌ |
Efficient batch processing | ✅ | ❌ |
Real-time updates | ❌ | ✅ |
Backup and archival | ✅ | ❌ |
Selective data retrieval | ❌ | ✅ |
Consistent snapshot of data | ✅ | ❌ |
- Decodes CAR files and extracts records in CBOR and JSON formats.
- Aggregates records by lexicon types (e.g.,
app.bsky.feed.post
). - Outputs structured JSON files for each lexicon type.
- Supports offline processing of repository snapshots.
To use this parser, ensure you have the following installed:
- Go (1.20 or later)
- A CAR file to process (exported from a Bluesky repository).
-
Clone this repository:
git clone https://github.com/thomasafink/bluesky-personal-data-parser-json.git cd bluesky-personal-data-parser-json
-
Install the required Go modules:
go mod tidy
- Download at https://bsky.app/settings via "Export My Data" and place the CAR file you want to parse in the root directory of the project. Make sure it is named
repo.car
or update thecarFilePath
in themain.go
file.
-
Run the parser:
go run main.go
-
The tool will:
- Save individual CBOR and JSON records in a directory named after the repository DID.
- Generate aggregated JSON files for each lexicon type (e.g.,
app_bsky_feed_post.json
) in the root directory.
After running the tool, you will find:
- A directory named after the DID (e.g.,
did:plc:abc123
) containing:- CBOR files for each record.
- JSON files for each record.
- Aggregated JSON files in the root directory for each lexicon type.
[
{
"$type": "app.bsky.feed.post",
"content": "Hello, world!",
"createdAt": "2023-07-01T23:30:08.840Z",
"author": "did:plc:xyz456"
}
]
- This tool only processes public CAR files. Private or encrypted repositories are not supported.
- Lexicon types not conforming to expected formats may generate warnings.
Feel free to submit issues or contribute improvements via pull requests. Contributions are welcome!
This project is licensed under the MIT License. See the LICENSE
file for details.