I see the ingested documents in the data folder don't have an id field, only a doc field.
{"doc": "Using Large Language Models in Pathway is simple: just call the functions from `pathway.stdlib.ml.nlp`!"}
What if I pass two contradictory statements? Is there a way to remove (or better update) a document with a new version?
For example, if I am ingesting some public docs, and I update a doc page. How do I make so that it only takes the answer from the latest document version?
This depends on the data source used. Some track updateable collections, some have a more "append-only" nature. For instance, tracing a database table using CDC+Debezium will support reacting to all document changes out of the box.
For file sources, we are working on supporting file versioning and integration with S3 native object versioning. Then the simply deleting the file or uploading a new version would be sufficient to trigger re-indexing the affected documents.
{"doc": "Using Large Language Models in Pathway is simple: just call the functions from `pathway.stdlib.ml.nlp`!"}
What if I pass two contradictory statements? Is there a way to remove (or better update) a document with a new version?
For example, if I am ingesting some public docs, and I update a doc page. How do I make so that it only takes the answer from the latest document version?