Hello everyone, wanting to have some fun this week, I created an API that allows you to easily access AI models (in this case, Google's) from the Shortcut app in order to analyze data from my apps and make the most of it thanks to the generative capabilities of advanced models.
It costs me nothing, and I think it might be good to share it so that others can build on it.
In README.md, you will find everything you need to get started and put your own microservice into production, which can be pinged from the application's HTTP request features.
You will simply be asked to have a free Cloudflare account and an API key obtained from Google's AI Studio.
Feel free to take a look and get back to me if you encounter any problems during deployment.
Although more and more code editors are aligning themselves with the AGENTS.md file standard, some still use specific nomenclatures that can make it difficult to maintain different configuration files when several people are working on the same project with different agents.
Bodyboard addresses this by generating canonical instructions for code helpers from a single AGENTS.md file, thereby streamlining the production of adapter outputs for Gemini CLI, Copilot, Cline, Claude, Rules, Windsurf, and OpenAI Codex integrations.
It's a very simple project, but it addresses certain issues I've encountered, so why not make it available to everyone...
If you have other ideas for adapters to create, feel free to open a PR on the GitHub repo.
Within a year, I had the opportunity to participate in hackathons organized by Mistral, OpenAI, and DeepMind. Leveraging this experience, I’ve created a boilerplate incorporating my feedback to streamline collaboration and urgent deployments.
This GitHub template is structured around several fundamental building blocks and recommendations I offer developers eager to participate in their first hackathon, whether as part of a team or individually. Its emphasis is on rapid setup and deployment through:
- uv as a package manager, simplifying usage via a series of pre-configured make commands.
- FastAPI for API management, structured in a modular architecture designed to minimize branch conflicts during merges to main branches (using minimal health-check and ping routes to verify Docker’s proper execution and backend accessibility on the local network).
- Pydantic for validation and type handling, which simplifies debugging and enhances understanding of data objects.
- A set of custom instructions tailored for agents (Cline and GitHub Copilot), aimed at improving overall comprehension of the application and optimizing the vibe-coding experience.
This template includes unit tests with a 100% success rate and test coverage, as well as a minimal CI file ensuring that the FastAPI application runs correctly. Thus, merging code that breaks the server into production becomes impossible.
In general, I would reiterate an essential piece of advice: your two main adversaries are branch conflicts—particularly when the same file is modified concurrently within a brief period, especially if your architecture isn’t built for scalability—and deployment issues under urgent circumstances (for instance, unstable Wi-Fi connections or complex infrastructure provided to you, which could hinder your associates from developing the frontend and successfully pinging the Python backend).
I am excited to share Lemone-API, an open-source project designed to facilitate processing French tax law and enable embeddings computation.
The API is tailored to meet the specific demands of information retrieval and classification across large-scale tax-related corpora, supporting the implementation of production-ready Retrieval-Augmented Generation (RAG) applications. Its primary purpose is to enhance the efficiency and accuracy of legal processes in the French taxation domain, with an emphasis on delivering consistent performance in real-world settings. Additionally, it contributes to advancements in legal natural language processing research.
The project is licensed under the Apache-2.0 License, ensuring flexibility for both personal and commercial use.
This agentic market analysis system is a Python-based framework that combines technical analysis with artificial intelligence to provide comprehensive market insights. At its core, the system implements a modular architecture that seamlessly integrates statistical analysis methods with natural language processing capabilities.
The system's foundation is built upon two primary technical indicators: the Relative Strength Index (RSI) and Bollinger Bands. The RSI implementation provides momentum analysis through a configurable calculation window (default: 14 periods), employing dynamic gain/loss computation and rolling averages to measure the velocity and magnitude of price movements. This is complemented by a Bollinger Bands implementation that utilizes Simple Moving Averages (SMA) and dynamic standard deviation calculations to create adaptive volatility bands that automatically adjust to market conditions.
Market data acquisition is handled through an integration with the Alpaca API, providing access to historical price data across various timeframes. The system employs Polars for high-performance data manipulation, leveraging its columnar storage format and lazy evaluation capabilities to efficiently process large datasets.
The AI integration layer bridges technical analysis with natural language processing using the Qwen2.5-72B-Instruct model via the Hugging Face API. This enables sophisticated market analysis by combining traditional technical indicators with real-time news sentiment analysis through DuckDuckGo search integration.
My biggest release of the year: a series of 7 specialized embedding models for information retrieval within tax documents, is now available for free on @huggingface
These new models aim to offer an open-source alternative for in-domain semantic search from large text corpora and will improve RAG systems and context addition for large language models.
Trained on more than 43 million tax tokens derived from semi-synthetic and raw-synthetic data, enriched by various methods (in particular MSFT's evol-instruct by @WizardLM_AI), and corrected by humans, this project is the fruit of hundreds of hours of work and is the culmination of a global effort to open up legal technologies that has only just begun.
A big thank you to @microsoftfrance for giving me access to state-of-the-art infrastructure to train these models, and to @julien_c, @ClemDelangue , @Thom_Wolf and the whole HF team for the inference endpoint API and the generous provision of @AIatMeta LLama-3.1-70B. Special thanks also to @tomaarsen for his invaluable advice on training embedding models and Loss functions
The Romulus model series has been released on Hugging Face, continually pre-trained on 34,864,949 tokens of French laws and intended to serve as a foundation for fine-tuning on labeled data
The training code, dataset and model weights are open and available free on HF and the training was based on H100 provided by Microsoft for Startups using Unsloth AI by @danielhanchen and @shimmyshimmer
Link to the base mode:
louisbrulenaudet/Romulus-cpt-Llama-3.1-8B-v0.1
Link to the instruct model:
louisbrulenaudet/Romulus-cpt-Llama-3.1-8B-v0.1-Instruct
Link to the dataset:
louisbrulenaudet/Romulus-cpt-fr
Please note that these models have not been aligned for the production of usable texts as they stand, and will certainly need to be refined for the desired tasks in order to produce satisfactory results.
RAGoon is a set of NLP utilities for multi-model embedding production, high-dimensional vector visualization, and aims to improve language model performance by providing contextually relevant information through search-based querying, web scraping and data augmentation techniques.
Here is an example of code to produce Embeddings for a given list of models:
```python
from ragoon import EmbeddingsDataLoader
from datasets import load_dataset
# Initialize the dataset loader with multiple models
loader = EmbeddingsDataLoader(
token="hf_token",
dataset=load_dataset("louisbrulenaudet/dac6-instruct", split="train"), # If dataset is already loaded.
# dataset_name="louisbrulenaudet/dac6-instruct", # If you want to load the dataset from the class.
model_configs=[
{"model": "bert-base-uncased", "query_prefix": "Query:"},
{"model": "distilbert-base-uncased", "query_prefix": "Query:"}
# Add more model configurations as needed
]
)
# Uncomment this line if passing dataset_name instead of dataset.
# loader.load_dataset()
# Process the splits with all models loaded
loader.process(
column="output",
preload_models=True
)
# To access the processed dataset
processed_dataset = loader.get_dataset()
```
In particularly, this tool also provides the functionality to load embeddings from a FAISS index, reduce their dimensionality using PCA and/or t-SNE, and visualize them in an interactive 3D graph.
You can now find the OBIS - Ocean Biodiversity Information System, on Hugging Face with 128M rows, via the Datasets package stream
The datasets are integrated, allowing seamless search and mapping by species name, higher taxonomic level, geographic area, depth, time, and environmental parameters. OBIS originates from the Census of Marine Life (2000-2010) and was adopted as a project under IOC-UNESCO’s International Oceanographic Data and Information (IODE) programme in 2009.
Collectively, they have provided over 45 million observations of nearly 120,000 marine species, ranging from bacteria to whales, from the surface to 10,900 meters depth, and from the tropics to the poles.
It costs me nothing, and I think it might be good to share it so that others can build on it.
In README.md, you will find everything you need to get started and put your own microservice into production, which can be pinged from the application's HTTP request features.
You will simply be asked to have a free Cloudflare account and an API key obtained from Google's AI Studio.
Feel free to take a look and get back to me if you encounter any problems during deployment.
Here is the GitHub repo where you can find all the source code and run it by your own: https://github.com/louisbrulenaudet/genai-api