But if I haven't spent the effort to extract it, do I really own it? Let me argue that I don't have it because all my implemented queries turn up none of your data. You wouldn't tax me on gold that hasn't yet been extracted, would you? (End of joke.)
What I think you'd typically do is put different data under different keys/paths, so that red is personally identifiable data, yellow contains pointers to such data, and green is just regular data. You could have a structure like s3://my-data-lake/{red|yellow|green}/{raw|intermediate}/year={year}/month={month}/day={day}/source={system}/dataset={table}
Then you just don't keep red data for longer than 30 days.