adaptive uncertainty-aware refinement for llm judge auditing
aura iteratively learns human-consistency signals to audit llm-as-a-judge decisions with minimal human verification.
topic
aura iteratively learns human-consistency signals to audit llm-as-a-judge decisions with minimal human verification.
information lattice learning can be interpreted as a method for learning the structure of probabilistic graphical models by projecting probability distributions onto partition lattices and lifting rules back.
datasette apps lets you host custom html and javascript applications inside datasette with secure sql query access.
learn how lateral, semi, and anti joins solve problems that inner and left joins cannot handle cleanly.
elastic agrees to acquire ai debugging startup deductiveai for up to $85 million to enhance its observability platform.
a new forecasting algorithm achieves optimal regret for both general and smooth proper losses, overcoming previous limitations in u-calibration.
openai adds ai pioneer noam shazeer and policy expert dean ball to its team as it prepares for a public offering.
snap creates separate company dotmo to build interactive ai gaming models, citing high internal costs.
ferc fast-tracks data center grid links, amazon eyes selling trainium chips, and research agents leak private data through web queries.
ferc orders grid operators to fast-track data center connections, addressing ai infrastructure delays.
amazon web services is exploring selling its trainium ai chips directly to other companies, potentially challenging nvidia's dominance.
mosaicleaks shows that deep research agents often expose private information when they search the web, and training them to be better at tasks makes the leakage worse.
the new york startup uses medal's 2 billion yearly gameplay videos to train embodied ai agents in spatial-temporal reasoning, attracting big-name investors.
pytorch's helion dsl uses llms to propose kernel configurations, matching bayesian optimization performance while exploring 10x fewer configs and cutting tuning time by 6.7x.
navi-orbital runs a vision-language model on a low earth orbit satellite to classify scenes, describe content, and answer operator questions in plain english.
adding human collaborators to ai teams can hurt performance without structured coordination, but shared memory and approval gates help.
a new benchmark evaluates language model agents on long-horizon business management by simulating 500 days of running a startup.
benchmarks show other parameter-efficient fine-tuning methods can beat lora on accuracy and memory, so it pays to test alternatives.